Using Python To Read a FieldWorks Lexicon
Well, it’s Christmas break for me, which usually means my head gets hungry to learn something new. Alas, that hasn’t happened yet, but I did try something new.
I got into Python for .Net and used it to write a trivial FieldWorks Language Explorer (FLEx) to Standard Format export sample.
The Vision
There is a tendency to view FieldWorks as a “black box”; with SFM/Toolbox, the data may be a complete inconsistent mess, but hey, it’s just text, and you can “get at it”. And while ordinary users have much more ability to munge their data using FLEx’s Bulk Edit tool, there will always be “just one more feature” needed by one or two people. I personally don’t believe helping people write SQL stored procedures is the answer.
Rather, since Python is gaining popularity in field linguistic computing, many people would like to see FLEx provide various levels of support for it. These would include:
A. Write your own python which talks to the FieldWorks Database.
B. Paste in a piece of python, which someone emailed you, into FLEx to do a fix, search, transform, etc. on your data.
C. Setup smart fields which provide default values by running a bit of python script whenever the record changes. These could then be overridden if the user types in the field. For example this would work for transductions into other scripts, or sort keys.
Trying it Out
I decided to start with the first scenario, running a script external (not embedded) to FLEx to do a trivial export.
UPDATE:
Ken Zook wrote up a bunch on doing this with Iron Python.
Python for .Net, unlike Microsoft’s Iron Python, runs under normal Python. This, I think, will make it more accessible to field workers wanting to use their Python skills to import, export, or modify FieldWorks databases. They don’t have to purchase and learn Visual Studio or loose access to some libraries.
I got the latest version out of subversion (the released version is quite old), applied the bizarre change mentioned in the VS_Readme, battled a bit with the supplied nant script, and compiled it (i.e., I think there’s room for us to help this project). The Taiwan earthquake broke our fiber connections out of Asia, but I finally found Python2.4 in Japan, installed that, then put the clr.dll and runtime into my python directory. Finally, I installed ActiveState Komodo, told it to use my FieldWorks directory, and wrote the following (this is may be only my second and longest python script ever, so be gentle):
#python defaults to ascii, so switch the console to unicode
import sys, codecs
sys.stdout = codecs.getwriter(‘utf-8’)(sys.stdout)
#this is the python.net piece that interfaces us to the .net
#FieldWorks libraries
import clr
from CLR.System.Reflection import Assembly
#FDO (FieldWorks Data Objects) is the
#Object-Relational-Mapper layer of FieldWorks
fdo = Assembly.LoadWithPartialName(“FDO”)
from CLR.SIL.FieldWorks.FDO import FdoCache
#open up a language project to work on
db = FdoCache.Create(“TestLangProj”)
lp = db.LanguageProject
vern = lp.DefaultVernacularWritingSystem
analysisWs = lp.DefaultAnalysisWritingSystem
lexicon = lp.LexicalDatabaseOA
for e in lexicon.EntriesOC :
print “\lx ” + e.LexemeFormOA.Form.GetAlternative(vern)
for sense in e.SensesOS :
print “\ge ” + sense.Gloss.GetAlternative(analysisWs)
if sense.MorphoSyntaxAnalysisRA <> None:
print “\pos “ + sense.MorphoSyntaxAnalysisRA.InterlinearAbbr
print
Problems
Now, there are some issues. A big one is that you don’t get Visual Studio’s Intellisense to tell you what properties and methods are available in FDO. Is there a way to provide that info to a python IDE some how? FDO is generated, so we could generate a file for the IDE. I see Komodo has an XML format for this called cix. Is there is such a format for a popular free python editor?
There are also problems accessing the COM objects that FDO exposes here and there. For example, if you want access to a multi-lingual or formatted string in FW, you currently can’t
get at it from Python. FDO is c# code, but these objects are lower than FDO, C++ wrapped for .Net. We don’t yet know if there is something that could be changed about that wrapping; IronPython has the exact same problem. Anyhow, I’m confident that we could wrap those low-level classes, as needed, to make them accessible to python scripters.
Questions
I’d like to play around more with this, and the FLEx team may be able to allocate time to it in the future, if folks show enough interest. But first, can anyone suggest typical scenarios? What would you envision wanting to do to a FLEx database? Make sort keys? Do complex selections that can then be used in Bulk Edit? Generate statistics? Out of these actual code scenarios, we could then think about a new namespace for FLEx scripters which would have friendly names , appropriate abstractions, and good documentation.