Major Data Structures

Key Java Classes Used in SILKin

JFile Chooser

The JFileChooser class provides a system-dependent but pretty standard file dialog. Its Method Summary provides all you really need to know. But there is a section of the Java Tutorial that discusses how best to use this class here.

ArrayList

The ArrayList class provides a nice refinement of the standard data array. Java provides for arrays of any data type. For example, an array of ints is declared the same way an int is declared, except a pair of square brackets is appended.
    int[] zipCodes = new int[5];
The above line of code creates an int array of size 5.

As handy as built-in arrays are, the ArrayList proved more useful. Importantly, it is an Object, not a primitive data type. Because Java is an object-oriented language, many of its facilities are built to handle objects. ArrayLists are, by default, lists of objects, but can be restricted to a particular type by appending a type declaration.
    ArrayList<ClauseBody> body = new ArrayList<>();

A major advantage of an ArrayList is that the above line of code creates an array ready for use, even though no size is specified. You can add a new item at the front, the back, or insert at any position. The ArrayList will dynamically adjust its size and re-order its contents in response to any add/delete action. It can be cleared with a single command, etc. It has lots of functionality not available in a standard list. It is just easier to use.

TreeMap

The TreeMap class is another versatile tool that appears in SILKin a lot. It is an efficient implementation of the Red-Black tree, with guaranteed log(n) time cost of most access operations. It is a sorted map, so I frequently used it as a quick sort.

Several important data structures in SILKin are TreeMaps:

  1. Special Relationships (adoptions) in ChartPanel
  2. numerous structures in ClauseBody's routines that support Horn Clause interpretation and simplification, gloss generation, etc.
  3. on a Context, the Learning History, UDPs (both the template and the field on each Individual), and the kinTypePriorityTMap
  4. on a DomainTheory, the theory itself is a TMap. Also definedTerms, nonUmbrellas, potentialUmbrellas, synonyms, umbrellas, overlaps, pos, neg, and potentialCandidates (for the learning module's use)
  5. DyadTMap is a minor extension of TreeMap, containing a nested TreeMap.
  6. the editsInProgress of EditTheoryFrame, plus several internal uses.
  7. the exactEQCs and structEQCs on the KinTermDef.
  8. the KinTermMatrix and KinTypeMatrix classes are TMaps.
  9. Library's activeContexts
  10. bindings in ClauseBody and Literal is a TMap that maps Horn Clause variables to their values
  11. the lookUpTable in MyResBundle

Inner Classes

In general, every class in Java is contained in a file named 'ClassName.java.' However, some classes are small and intended only for a specific purpose; typically to be used in a larger class. Because of those cases, Java allows the programmer to define a class inside the definition of another class; this is an Inner Class. In the name space, if the class BigClass contains within it an inner class called SmallClass, the name of the inner class is 'BigClass.SmallClass.'

For example:
The LibBrowser class is a GUI frame with several buttons and menus. It is convenient to define a separate inner class (that extends Java's ActionListener class) to listen for activity on each menu or button. So LibBrowser has seven such inner classes. It also has an inner class that defines a special data structure used only by LibBrowser: TermTriple. Any other class wishing to reference a TermTriple must use its full name: LibBrowser.TermTriple. And its Java documentation is in the file LibBrowser.TermTriple.html.

ActionListener

Every gesture made by a user in a GUI generates lots of ActionEvents. If the programmer has created a class that extends Java's ActionListener class, and has 'attached' that listener to the GUI object that generates the ActionEvents, then the listener's code will be called every time an ActionEvent is detected. The code of an ActionListener can access the details of the ActionEvent to determine whether this particular action is of interest. In SILKin, whenever I had a multipurpose listener I assigned an ActionCommand that was a String describing the user's action. Then my listener would test the text of the ActionCommand to determine which code to run.

Early in the project, when I was fairly new to Java, I would assign a single listener to an entire menu bar (see MainPane). This proved to be awkward, and I later shifted to creating a special Inner Class listener for each GUI element.

NetBeans GUI Builder Forms

The User Interface of SILKin is an inconsistent, chaotic mess for which I take full responsibility. The academic portion was written in 2001 - 2006, while Java was still a fairly young language. MainPane was originally the main class, and its GUI was built as a convenience for my research. I literally copied the framework from a book example and modified it for my purposes. It used the Javax windowing tools.

The data-gathering and tree-drawing screen, SIL_Edit was adapted from the KAES system written even earlier by Michael D. Fischer of the Centre for Social Anthropology and Computing, University of Kent, U.K. It used the precursor to Javax, the Java AWT (Abstract Windowing Toolkit).

The later portions of the code, added after 2006, were built using Sun's (later Oracle's) proprietary IDE: NetBeans. It includes a graphical GUI builder that allows 'drag and drop' creation of a screen. When all the elements have been named, located, and their visual behaviors specified, a '.form' file is generated that captures the design. Every time the java code file is opened for edit, the GUI portion is generated from the form file. The generated code is placed in a special method called 'initComponents()' that is not editable by the programmer. But all the GUI objects created by the generated code are fully accessible, so any of their characteristics or behaviors can be amended. My convention was to place my GUI-modifying code in a method called 'prepComponents().'

The NetBeans-generated code always creates a special inline anonymous ActionListener attached to every GUI element; all that listener does is invoke a special method. That special method is also generated and non-editable, but the programmer can insert his code in its body.

The GUI builder was a great time saver. But it had a few bugs that caused headaches. In one class, it broke when the class file got really large and I never could fix it. The solution (in case you ever need it) was to copy the entire class file to a plain code editor (BBEdit, in my case) and strip out all the editor commands that made generated code off limits. Then read it back into NetBeans and continue as though I had written the Javax code that had been generated.

Another problem encountered with the GUI builder was some clunkiness in dragging and dropping. In one case it allowed me to place a JScrollPane and then place a text pane inside it. But the bottom of the scroll pane got covered up so that no matter what I did, the horizontal scroll bars were never visible. That drove me nuts until I found it.

If it is ever necessary to adjust or amend a screen, I'd recommend trying to just revise the '.form' file via the NetBeans facility, but be aware of its drawbacks.

Key SILKin Classes

There are over 175 classes in SILKin. Many of them are self-evident and others are utilities you will probably never look at. Below are the key classes that the code is constantly addressing and processing.

Context

Each language project, each oral culture that is targeted, constitutes one context. So everything that we learn about that culture is stored on the Context object. Between sessions, the data in the Context is stored in a SILK file. When the User is entering kinship data in the SIL_Edit screen, they are editing their Context and one (perhaps the only) DomainTheory for that Context. The following fields of a Context object hold key context-wide information.

        Individual Census

The individualCensus field of a Context holds an ArrayList<Individual>. If I were starting from scratch, all census fields would be TreeMaps, not ArrayLists. But the KAES code I adapted already had a lot of functionality built on top of arrays of individuals and families. It was much easier to convert those arrays to ArrayLists than to TreeMaps. Because each Individual created carries a system-assigned, zero-based serialNmbr, that number is also their index in the ArrayList so long as I never delete an internal item. Therefore, one of the weird facts of SILKin is that unless the User is deleting the last Individual created (an 'oops'), the Individual is not deleted; their object is just marked 'deleted' and thereafter is skipped in all processing. But through the ContextEditor, a User can undelete a person or family.

                 Individual

The Individual class is an extension of the Person class. That is an awkward arrangement, but an early goal of SILKin was to create structured data that could be imported to the KAES system. (That goal was quickly abandoned at their end.) The Person class has all the variables and methods that KAES created, and the Individual class has my additions. Much of the KAES functionality has been superseded, but is retained for historical purposes.

The Individual class contains several key fields:

The Individual class inherits several important fields from its superior class, Person.

        Family Census

The familyCensus field of a Context holds an ArrayList<Family>. Just as with Individual, this is an ArrayList because it was much easier to convert KAES arrays to ArrayLists. Because each Family created carries a system-assigned, zero-based serialNmbr, that number is also their index in the ArrayList, and therefore unless the User is deleting the last Family created, the Family is not deleted; their object is just marked 'deleted' and thereafter is skipped in all processing. In the ContextEditor a User can undelete a person or a family.

                 Family

The Family class is an extension of the Marriage class. The Marriage class has all the variables and methods that KAES created, and the Family class has my additions. As with Individuals, KAES functionality has been superseded, but is retained for historical purposes.

A Family object has only a few important fields.

NOTE: SILKin follows a convention advocated by the field workers. If a marriage is ended by divorce, the marriage symbol is drawn with a diagonal line through it (similar to the symbol for a dead person). But if a union or marriage is ended by the death of a spouse, the union symbol does not have the diagonal line. The 'dead' symbol for the spouse is sufficient.

The Family class inherits several important fields from its superior class, Marriage.

BirthGroup is an inner class in the Marriage class. It was created to properly display twins, triplets, etc. A single-born child is displayed as an Individual below their birth family, connected by a vertical line up to the horizontal line from which all children descend. By convention, twins are connected to the horizontal line by a pair of diagonal lines descending from a single point. (See the diagram below.)

                 Singles & Twins on a Chart

Because the twins feature was added late to the drawing code, every child in a Family is stored twice: in the children field and in the birthGrps field. A single-born child is the only member of their BirthGroup. A BirthGroup with multiple members represents multiple births. The definition of multiple birth in SILKin is two or more children born in the same month and year. Whenever a person's birth month or year is edited, we check to see if a multiple birth may have been created or deleted by re-computing the BirthGroups in that person's birthFamily.

        Link Census

The linkCensus field of a Context holds an ArrayList<Link>. This is an ArrayList to be consistent with the structures used for Individuals and Families. KAES has no links because KAES does not allow for multi-chart diagrams. Because a link's serial number is its index into this ArrayList, we follow the same convention re: deletions as for Individuals and Families.

                 Link

The Link class gives us the link symbols that are placed on a chart to represent an Individual whose home chart is elsewhere but has a relationship to someone on this chart. (Links are discussed in the Help files.)

Because a link is essentially just a pointer, it has only a few important fields:

        Kin Term Matrix

The ktm field of a Context holds an instance of the KinTermMatrix class. This is the repository of all kinship data the User has gathered. It is implemented as a nested TreeMap (i.e. a sparse array) but should be thought of as a two-dimensional matrix. Each row of the ktm represents an Ego in this context; row 12 represents the person with serial number 12. Each column represents an Alter. So the 5th cell in the 3rd row contains all the kin terms that #3 would call #5, and (if there is a distinct set of Terms of Address) how #3 would address #5.

These cells are, of course, nodes as discussed above. Whenever a new Ego is selected by the User, every Individual in the context places in their node field (a pointer to) their cell from the row representing the current Ego. As Alter's kin terms are entered in the Detail Display for Alter they are stored on Alter's node, i.e. Alter's cell in the KinTermMatrix.

A KinTermMatrix has only one field, the matrix. This is a nested TreeMap: TreeMap<Integer, TreeMap<Integer, Node>>. The top-level key is an Integer equal to Ego's serialNmbr and its value is another TreeMap whose keys are Integers equal to each possible Alter's serialNmbr and whose values are Nodes.

The data re: what an Ego calls an Alter is stored in other places in SILKin for computational convenience. But the KinTermMatrix is considered the 'master copy' from which all the others can be derived.

        Kin Type Index

The kti field of a Context holds an instance of the KinTypeIndex class. This is one of the 'computational convenience' structures referred to above. It is a fast way to access pairs of Individuals who satisfy a particular structural relationship; this is useful when making a data request during a learning session.

There are only two important fields on a KinTypeIndex:

The KinTypeIndex is rebuilt from the KinTermMatrix when a chartable UDP is created, modified or deleted. It is also rebuilt when a person is removed from a family. When we are building examples of a particular Horn Clause for learning purposes, we create a special, hypothetical context for learning and maintain the ktm and kti of that context.

DomainTheory

Every context has one or two domain theories; two if there are distinct Kin Terms of Address. A DomainTheory is composed of definitions for kin terms in Horn Clause syntax. It also can have a number of attributes (that were documented in the file specification here). Because a domain theory is the focus of our attention in a learning session or whenever we are reasoning about kin term definitions, there are a lot of methods and fields in this class. The class became so large that I split it into 3 different files:

The principal fields of a domain theory, as a data structure, are these:

        Reference vs. Address

Most cultures use the same kin terms to refer to a kinsman and to address them. But a few have different rules for addressing, so every context allows for up to two domain theories. When the User is gathering data, a toggle in the 'Context' menu declares whether there are separate terms of address in this culture.

When there are no separate terms of address (the default) then every time the User enters the kin term that Ego calls Alter, SILKin automatically records that as the term Ego uses to address Alter. Those default terms of address will appear in the Detail Display of the SIL_Edit screen, but they are not editable.

If at some point the User discovers that separate terms of address exist, she can change the toggle in the 'Context' menu. That will make the Address Terms field in the Detail Display editable (so she can make retroactive changes) and henceforth she will need to manually enter the terms of address as well as the terms of reference.

Because a context's address terms are a separate domain theory, anytime the User calls for a learning session ('Get New Suggestions') SILKin will analyze both domain theories and generate issuesForUser on each. In that case, when she decides to 'Act on Suggestions' she must choose which set to work on.

        Horn Clause Syntax

                 KinTermDef

The theory field of a DomainTheory holds a TreeMap<String, KinTermDef> where the key is the kin term and the value is the data structure that will define it in Horn Clause syntax, a KinTermDef. An instance of a KinTermDef (often referred to as a ktd) has these important fields:

NOTE: At the end of my stewardship of the SILKin code, glosses are involved in an unresolved issue. In English and many other languages, we express a relationship verbally by saying 'Bob is John's father's sister's husband.' A gloss is exactly that: 'father's sister's husband.' The PCString that is used in the code to represent a kin type (e.g. 'FaSisHu') is structurally identical to the gloss in English.

When the User chooses to see menus and screens in French, everything they see is in French, including all suggestions, except for a gloss. In French they would say Bob is John's husband of the sister of his father, or 'Hu de Sis de Fa.' Since Gary Simons is the author of the XSL files that format suggestions into English or French, he must play a key role in re-formatting a gloss. Unfortunately, the press of other duties has prevented him from doing so. Therefore, the gloss shown to a French-speaking User is still in the English style. This is not a major flaw since very few users pay any attention to suggestions. But the issue should be resolved.

                 ClauseBody

The primary purpose of a KinTermDef is, of course, to define a kin term in Horn Clause syntax. The definitions and expandedDefs fields contain instances of the ClauseBody class. As a data structure a ClauseBody has these fields:

                 Literal

The Literal class models the components of a clause. Just like DomainTheory, this class is the location of much of the action when learning or reasoning about Horn Clauses and has many methods. So the code is broken into three files:

  1. LiteralAbstract1 is the base class, and is abstract. It is the physical location of a Literal's fields.
  2. LiteralAbstract2 contains more methods. It is also abstract.
  3. Literal is the class that can be instantiated.

As a data structure, a Literal has only two significant fields:

                 Argument

The Argument class is an abstract class that is the superclass of anything that can be an argument to a Horn Clause. It has four instantiable subclasses:

All subclasses of Argument inherit these fields:

For example: Here is a Horn Clause (one of four for the English term 'uncle'):

        uncle(Alter, Ego) :- father(B, Ego), brother(Alter, B).

This entire Horn Clause is a ClauseBody, one of four in the definitions field of the KinTermDef for 'uncle.' The literal 'uncle(Alter, Ego)' is the headPred. There are two literals in the body field's ArrayList. In the first literal, 'father' is the predicate (a primitive one) and the two arguments are 'B' and 'Ego.' Both of those args begin with a capital letter, so they are variables.

Dyads Defined/Undefined

As a user gathers data about the target culture, she records what a particular Ego calls a particular Alter (and also possibly distinct address terms). This information is stored directly on the Node for Alter that is part of Ego's row in the KinTermMatrix. When anthropologists think about this data, they think in terms of dyads. A dyad is defined as a triple: a pair of individuals and the kin term. When an anthropologist has gathered enough dyads that all have the same kin term, she will try to figure out the pattern and deduce the definition of that kin term.

SILKin creates instances of the Dyad class as the user gathers data. If the kin term already has a definition in the appropriate DomainTheory, the dyad will be stored on that domain theory's dyadsDefined field which is a DyadTMap. If no definition exists yet, the dyad will be stored on another DyadTMap, dyadsUndefined.

As a data structure, a Dyad has these important fields:

A DyadTMap is a convenient extension of the TreeMap class. It simply adds one method to the normal TreeMap methods: dyAdd(a-dyad). The structure of the nested TreeMap in a DyadTMap is TreeMap<String, TreeMap<String, ArrayList<Dyad>>>. The top-level key is the kin term, and its value is the second-level TMap. The second-level key is the pcString and the value is a list of all the dyads for this kin term that have that pcString. The pcString, of course, is the index into the ClauseIndex that plays a central role in learning sessions.

Issues for Users (Suggestions)

Although users frequently ignore suggestions, the learning sessions that produce suggestions are the academic focus of SILKin. The process for generating Suggestions is discussed above. The data structures for suggestions have the abstract superclass Issue. All of its subclasses inherit two fields:

There are seven subclasses that can be instantiated; one for each type of suggestion SILKin might make.

  1. An Anomaly is generated when a small number of dyads seem to violate a candidate definition that is otherwise a good fit. It has two significant fields:
  2. A ProposedDef is generated when SILKin has found a definition in the Library of known kinship systems that covers all the kin types seen so far for the kin term being learned (and perhaps a few more). It has two significant fields:
  3. A ComposedDef is built by induction when SILKin has failed to find any Library definitions that might be a good fit. SILKin will attempt to compose a definition only if the 'Do Pure Induction' option is selected in the 'Edit Prefs' screen. This is exceedingly rare. A ComposedDef has three fields that document how it was built:
  4. A SynonymCandidate is generated when SILKin has found dyads that seem to have the same definition, but they have two different kin terms. There are too many dyads of each type for this to be a few errors. A SynonymCandidate has four fields:
  5. An UmbrellaCandidate is generated when SILKin has found two or more kin terms that have numerous dyads; i.e. they seem to have solid definitions. But there is also a kin term that seems to cover all of these kin types. In English, a good example is the term 'grandparent.' This term covers both grandfathers and grandmothers; it is an umbrella term. An UmbrellaCandidate has only two fields:
  6. An OverlapCandidate is generated when SILKin has found two kin terms that seem to have solid definitions, but they both cover a particular kin type. Neither one is an umbrella that covers all of the other. And they aren't synonyms. In a few languages, these overlaps occur. As a data structure, an OverlapCandidate has four fields:
  7. A DataRequest is generated when SILKin has found two or more kin term definitions in the Library that cover all the kin types of the kin term we're trying to learn. But each of the candidates also covers some other kin types. A data request identifies the most powerful new dyad(s) that can eliminate the most candidates. If there are two or more persons already in this context that have the kin type we are seeking, SILKin will ask the User to provide a kin term for that pair. Otherwise, it will describe the kin type and ask User to find two such persons and record their kin term. A DataRequest has only one field:

Parsers

A parser is not a data structure, but the parsers in SILKin create most of the data structures, so I felt they should be discussed in this section.

There are several open source parsers available that could have been configured to handle the various file formats SILKin uses. But my first significant programming task in Java was a class project: building a Java parser. Because I had a template and experience, I decided to build a parser for human-prepared domain theory files, i.e. kinship systems that I had created from my anthropological research. This allowed freedom, flexibility, and precision that I found valuable, so I ended up hand-coding all parsers in SILKin.

My ParserDomainTheory and ParserGEDCOM evolved slightly over time, but were pretty stable over the entire project. My initial format for a SILK file was strictly ad hoc, and my ParserSILKFilePreXML (originally just SILKParser) reflected that. Gary Simons suggested that making the SILK file a proper XML format would be profitable, so the format was changed and a new ParserSILKFile was written. Thus, there are only three parsers you will ever need to deal with, and all three follow the same template.

ParserDomainTheory and ParserSILKFile were considered foundational, so I prepared a Context Free Grammar documenting each of them: DomainTheoryGrammar and SILKFileGrammar. These grammars are not guaranteed to be up-to-date, but they are a good map to the parsers. The names of the parsing methods correspond to the elements in the grammars. ParserSILKFile is technically a subclass of ParserDomainTheory, but it turned out to not inherit much of value from the superclass except the error() method.