The JFileChooser class provides a system-dependent but pretty standard
file dialog. Its Method Summary provides all you really need to know. But there is a
section of the Java Tutorial that discusses how
best to use this class
here.
ArrayList
The ArrayList class provides a nice refinement of the standard data array.
Java provides for arrays of any data type. For example, an array of ints is declared the
same way an int is declared, except a pair of square brackets is appended. int[] zipCodes = new int[5];
The above line of code creates an int array of size 5.
As handy as built-in arrays are, the ArrayList proved more useful. Importantly, it is
an Object, not a primitive data type. Because Java is an object-oriented language, many
of its facilities are built to handle objects. ArrayLists are, by default, lists of objects,
but can be restricted to a particular type by appending a type declaration. ArrayList<ClauseBody> body = new ArrayList<>();
A major advantage of an ArrayList is that the above line of code creates an array
ready for use, even though no size is specified. You can add a new item at the front, the
back, or insert at any position. The ArrayList will dynamically adjust its size and re-order
its contents in response to any add/delete action. It can be cleared with a single command, etc.
It has lots of functionality not available in a standard list. It is just easier to use.
TreeMap
The TreeMap class is another versatile tool that appears in SILKin
a lot.
It is an efficient implementation of the Red-Black
tree, with guaranteed log(n) time cost of most access operations. It is a sorted map, so I
frequently used it as a quick sort.
Several important data structures in SILKin are TreeMaps:
Special Relationships (adoptions) in ChartPanel
numerous structures in ClauseBody's routines that support Horn Clause interpretation
and simplification, gloss generation, etc.
on a Context, the Learning History, UDPs (both the template and the field on each
Individual), and the kinTypePriorityTMap
on a DomainTheory, the theory itself is a TMap. Also definedTerms, nonUmbrellas,
potentialUmbrellas, synonyms, umbrellas, overlaps, pos, neg, and potentialCandidates (for the learning module's use)
DyadTMap is a minor extension of TreeMap, containing a nested TreeMap.
the editsInProgress of EditTheoryFrame, plus several internal uses.
the exactEQCs and structEQCs on the KinTermDef.
the KinTermMatrix and KinTypeMatrix classes are TMaps.
Library's activeContexts
bindings in ClauseBody and Literal is a TMap that maps Horn Clause variables to their values
the lookUpTable in MyResBundle
Inner Classes
In general, every class in Java is contained in a file named 'ClassName.java.' However,
some classes are small and intended only for a specific purpose; typically to be used in a larger
class. Because of those cases, Java allows the programmer to define a class inside the
definition of another class; this is an Inner Class. In the name space, if the class
BigClass contains within it an inner class called SmallClass, the name of the inner class
is 'BigClass.SmallClass.'
For example:
The LibBrowser
class is a GUI frame with several buttons and menus. It is convenient to define a separate
inner class (that extends Java's ActionListener class) to listen for activity on each menu
or button. So LibBrowser has seven such inner classes. It also has an inner class that
defines a special data structure used only by LibBrowser:
TermTriple. Any other class wishing to reference a
TermTriple must use its full name: LibBrowser.TermTriple. And its Java documentation is in
the file LibBrowser.TermTriple.html.
ActionListener
Every gesture made by a user in a GUI generates lots of ActionEvents. If the
programmer has created a class that extends Java's ActionListener class, and has 'attached'
that listener to the GUI object that generates the ActionEvents, then the listener's
code will be called every time an ActionEvent is detected. The code of an ActionListener can
access the details of the ActionEvent to determine whether this particular action is of
interest. In SILKin, whenever I had a multipurpose listener I assigned an ActionCommand
that was a String describing the user's action. Then my listener would test the text of the
ActionCommand to determine which code to run.
Early in the project, when I was fairly new to Java, I would assign a single listener
to an entire menu bar (see
MainPane). This
proved to be awkward, and I later shifted to creating a special Inner Class listener for
each GUI element.
NetBeans GUI Builder Forms
The User Interface of SILKin is an inconsistent, chaotic mess for which I take full
responsibility. The academic portion was written in 2001 - 2006, while Java was still a
fairly young language. MainPane was originally the main class, and its GUI was built as
a convenience for my research. I literally copied the framework from a book example and
modified it for my purposes. It used the Javax windowing tools.
The data-gathering and tree-drawing screen, SIL_Edit was adapted from the KAES system
written even earlier by Michael D. Fischer of the Centre for Social Anthropology and
Computing, University of Kent, U.K. It used the precursor to Javax, the Java AWT (Abstract
Windowing Toolkit).
The later portions of the code, added after 2006, were built using Sun's (later Oracle's)
proprietary IDE: NetBeans. It includes a graphical GUI builder that allows 'drag and drop'
creation of a screen. When all the elements have been named, located, and their visual
behaviors specified, a '.form' file is generated that captures the design. Every time the
java code file is opened for edit, the GUI portion is generated from the form file. The
generated code is placed in a special method called 'initComponents()' that is not editable
by the programmer. But all the GUI objects created by the generated code are fully accessible,
so any of their characteristics or behaviors can be amended. My convention was to place my
GUI-modifying code in a method called 'prepComponents().'
The NetBeans-generated code always creates a special inline anonymous ActionListener
attached to every GUI element; all that listener does is invoke a special method. That
special method is also generated and non-editable, but the programmer can insert his code
in its body.
The GUI builder was a great time saver. But it had a few bugs that caused headaches. In
one class, it broke when the class file got really large and I never could fix it. The
solution (in case you ever need it) was to copy the entire class file to a plain code editor
(BBEdit, in my case) and strip out all the editor commands that made generated code
off limits. Then read it back into NetBeans and continue as though I had written the Javax
code that had been generated.
Another problem encountered with the GUI builder was some clunkiness in dragging and
dropping. In one case it allowed me to place a JScrollPane and then place a text pane
inside it. But the bottom of the scroll pane got covered up so that no matter what I did,
the horizontal scroll bars were never visible. That drove me nuts until I found it.
If it is ever necessary to adjust or amend a screen, I'd recommend trying to just revise
the '.form' file via the NetBeans facility, but be aware of its drawbacks.
Key SILKin Classes
There are over 175 classes in SILKin. Many of them are self-evident and others are
utilities you will probably never look at. Below are the key classes that the code is
constantly addressing and processing.
Context
Each language project, each oral culture that is targeted, constitutes one context. So
everything that we learn about that culture is stored on the
Context object. Between sessions,
the data in the Context is stored in a SILK file. When the
User is entering kinship data in the SIL_Edit screen, they are editing their Context and
one (perhaps the only) DomainTheory for that Context. The following fields of a Context
object hold key context-wide information.
Individual Census
The individualCensus field of a Context holds
an ArrayList<Individual>.
If I were starting from scratch, all census fields would be TreeMaps, not ArrayLists.
But the KAES code I adapted already had a lot of functionality built on top of arrays
of individuals and families. It was much easier to convert those arrays to ArrayLists
than to TreeMaps. Because each Individual created carries a system-assigned, zero-based
serialNmbr, that number is also their index in the ArrayList
so long as I never delete an internal item.
Therefore, one of the weird facts of SILKin is that unless the User is deleting the last
Individual created (an 'oops'), the Individual is not deleted; their object is just marked
'deleted' and thereafter is skipped in all processing. But through the ContextEditor, a
User can undelete a person or family.
Individual
The Individual class is
an extension of the Person class.
That is an awkward arrangement,
but an early goal of SILKin was to create structured data that could be imported to the
KAES system. (That goal was quickly abandoned at their end.) The Person class has all the
variables and methods that KAES created, and the Individual class has my additions. Much of
the KAES functionality has been superseded, but is retained for historical purposes.
The Individual class contains several key fields:
the birthFamily field holds a pointer to the
Family object in which this Individual is a child.
NOTE:
in Java there are no explicit pointers. Whenever you set a field equal to an object, or add
an object to a list, you are really adding a pointer. Hereafter, I will simply say what a field
'holds.'
the marriages field holds an ArrayList<Family>.
This allows for unlimited 'Unions' of any type:
serial monogamous marriages
concurrent polygamous marriages
unsanctioned unions that produced offspring
the User Defined Property field holds a
TreeMap of UDPs, or null if there are none defined for this Context. This TreeMap has
keys equal to the names of the pre-defined UDPs (that must start with an asterisk), and values
are UDP objects instantiated with data personal to this Individual.
A UserDefinedProperty
object has several important properties (fields):
chartable is a boolean field that indicates
whether this UDP should appear on charts, and chartColor
is a Java color code that specifies the color line to be drawn to any object connected by
this UDP.
defaultValue holds an object that is automatically
assigned as this Individual's UDP value when the Individual is created. The User can change
the value in the Person Detail Display in SIL_Edit. This field is null if no default
was declared when the UDP was created on this Context.
maxVal and minVal
are numbers that specify the permissible range of values for this UDP. Only non-null for
numeric-valued UDPs and if specified by the User when this UDP was created.
singleValue is a boolean indicating whether or
not this UDP can only have one value at a time. Specified by the User at UDP creation.
type and validEntries
declare the data type of this UDP's values, and (if non-null) the discrete list of permissible
values. These are declared by the User at UDP creation.
Unlike all of the above, value and
connects may be different for each Individual. The
value field holds an ArrayList<Object> to allow for possible multiple values of any
type. Because type checking is not performed automatically by the ArrayList, SILKin does it
explicitly when the User enters a proposed value via the UDP box on the Individual Detail
Display in SIL_Edit.
The connects field holds a boolean that determines
if this UDP is allowed to link Individuals on similar (or dissimilar) values.
the node field on an Individual holds a
Node object. All the information
about kin terms is contained in Nodes. The node currently attached to Individual #23
specifies what the currently-designated Ego calls #23 when #23 is Alter. Every time the
User designates a new Ego, every Individual in the Context must store their current node
(for the old Ego)
on the Kin Term Matrix and pull from that matrix the node
appropriate for the new Ego. A node has these fields:
kinTermsRef and
kinTermsAdr are ArrayLists of Strings, each of which
is a kin term in the Context. extKinTermsRef and
extKinTermsAdr are similar, but hold the kin terms
appropriate under extended meanings of terms.
the pcString field holds a string that is the
concatenation of relationship symbols that describe the path from the current Ego to the
Individual to whom this node belongs. For example, if Alter (this Individual) is a male maternal
first cousin of Ego in English kinship, then the pcString would be 'MoBroSo' or 'MoSisSo'.
the indiv field holds a back-pointer to the
Individual who 'owns' this node.
The Individual class inherits several important fields from its superior class,
Person.
the homeChart field holds a capital letter
indicating on which chart (of a multi-chart diagram) this individual's symbol should be
drawn.
the location field holds a Point (x,y)
where the upper left corner of this individual's symbol will be drawn.
the sex field holds a
Kind. The Kind class is not abstract,
but this field is always filled by a subclass of Kind:
The sex field not only designates gender, but each subclass of Kind has methods for
drawing its symbol on a chart.
NOTE: that KAES allowed for creation of neuter
individuals; SILKin does not. Horn Clauses may contain variables of unspecified gender, but
normally the gender is implied by some other literal in the clause. When the gender is not
implied, SILKin will treat it as we-don't-care and arbitrarily instantiate the variable as
a Male or a Female.
Family Census
The familyCensus field of a Context holds
an ArrayList<Family>.
Just as with Individual, this is an ArrayList because it was much easier to convert KAES
arrays to ArrayLists. Because each Family created carries a system-assigned, zero-based
serialNmbr, that number is also their index in the ArrayList, and therefore unless the User
is deleting the last Family created, the Family is not deleted; their object is just marked
'deleted' and thereafter is skipped in all processing. In the ContextEditor a
User can undelete a person or a family.
Family
The Family class is
an extension of the Marriage class.
The Marriage class has all the variables and methods that KAES created, and the Family
class has my additions. As with Individuals, KAES functionality has been superseded,
but is retained for historical purposes.
A Family object has only a few important fields.
The marriageMM and
marriageDD fields hold the month and day of the
creation date of this union.
The divorceMM and
divorceDD fields hold the month and day of the
end date of this union.
NOTE: SILKin follows a convention advocated by
the field workers. If a marriage is ended by divorce, the marriage symbol is drawn with a
diagonal line through it (similar to the symbol for a dead person). But if a union or marriage
is ended by the death of a spouse, the union symbol does not have the diagonal line. The
'dead' symbol for the spouse is sufficient.
The Family class inherits several important fields from its superior class, Marriage.
the reason field holds a String that, if non-null,
indicates why this union ended.
the homeChart field holds a capital letter
indicating on which chart (of a multi-chart diagram) this family's symbol should be
drawn.
the location field holds a Point (x,y)
where the upper left corner of this family's symbol will be drawn.
the husband and wife
fields hold pointers to the two parents.
the children field holds an ArrayList<Individual>,
the list of biological children from this union. (Adopted children are linked via a
chartable UDP.)
the birthGrps field holds an ArrayList<BirthGroup>.
BirthGroup is an inner
class in the Marriage class. It was created to properly display twins,
triplets, etc. A single-born child is displayed as an Individual below their birth family,
connected by a vertical line up to the horizontal line from which all children descend. By
convention, twins are connected to the horizontal line by a pair of diagonal lines descending
from a single point. (See the diagram below.)
Because the twins feature was added late to the drawing code, every child in a Family is
stored twice: in the children field and in the
birthGrps field. A single-born child is the only
member of their BirthGroup. A BirthGroup with multiple members represents multiple births.
The definition of multiple birth in SILKin is two or more children born in the same month
and year. Whenever a person's birth month or year is edited, we check to see if a multiple
birth may have been created or deleted by re-computing the BirthGroups in that person's
birthFamily.
Link Census
The linkCensus field of a Context holds
an ArrayList<Link>.
This is an ArrayList to be consistent with the structures used for Individuals and Families. KAES
has no links because KAES does not allow for multi-chart diagrams. Because a link's serial
number is its index into this ArrayList, we follow the same convention re: deletions as for
Individuals and Families.
Link
The Link class gives us the link
symbols that are placed on a chart to represent an Individual whose home chart is elsewhere but
has a relationship to someone on this chart. (Links are discussed in the
Help files.)
Because a link is essentially just a pointer, it has only a few important fields:
the deleted field is a boolean.
the homeChart field is a capital letter indicating
on which chart the link symbol should be drawn.
the location field is a Point (x,y) where the
upper left corner of the link symbol should be drawn.
the personPointedTo field holds the Individual
that this link represents.
the sex field is just like the corresponding
field on an Individual. It holds a Kind object that knows how to draw the male or female
symbols.
Kin Term Matrix
The ktm field of a Context holds an instance of
the KinTermMatrix class. This
is the repository of all kinship data the User has gathered. It is implemented as a nested TreeMap
(i.e. a sparse array) but should be thought of as a two-dimensional matrix. Each row of the
ktm represents an Ego in this context; row 12 represents the person with serial number 12.
Each column represents an Alter. So the 5th cell in the 3rd row contains all the kin terms
that #3 would call #5, and (if there is a distinct set of Terms of Address) how #3 would
address #5.
These cells are, of course, nodes as discussed above. Whenever
a new Ego is selected by the User, every Individual in the context places in their
node field (a pointer to) their cell from the row
representing the current Ego. As Alter's kin terms are entered in the Detail Display for
Alter they are stored on Alter's node, i.e. Alter's cell in the KinTermMatrix.
A KinTermMatrix has only one field, the matrix.
This is a nested TreeMap: TreeMap<Integer, TreeMap<Integer, Node>>. The top-level
key is an Integer equal to Ego's serialNmbr and its value is another TreeMap whose keys
are Integers equal to each possible Alter's serialNmbr and whose values are Nodes.
The data re: what an Ego calls an Alter is stored in other places in SILKin for computational
convenience. But the KinTermMatrix is considered the 'master copy' from which all the others
can be derived.
Kin Type Index
The kti field of a Context holds an instance of
the KinTypeIndex class. This
is one of the 'computational convenience' structures referred to above. It is a fast way to
access pairs of Individuals who satisfy a particular structural relationship; this is useful
when making a data request during a learning session.
There are only two important fields on a KinTypeIndex:
tmap is a nested TreeMap:
TreeMap<String, TreeMap<Integer, ArrayList<Integer>>>. The top-level key
is a PCString representing the path between two persons. The key of the inner TreeMap is an
Integer equal to the serialNmbr of an Ego. The value is an ArrayList of Integers equal to
the serialNmbrs of every person who has that structural relationship to that Ego.
lastSerial is the highest serialNmbr in the
individualCensus when the kti was last updated.
The KinTypeIndex is rebuilt from the KinTermMatrix when a chartable UDP is created,
modified or deleted. It is also rebuilt when a person is removed from a family. When we
are building examples of a particular Horn Clause for learning purposes, we create a
special, hypothetical context for learning and maintain the ktm and kti of that context.
DomainTheory
Every context has one or two domain theories; two if there are distinct Kin Terms of Address.
A DomainTheory is composed
of definitions for kin terms in Horn Clause syntax.
It also can have a number of attributes (that were documented in the file specification
here). Because a domain theory is the focus of our
attention in a learning session or whenever we are reasoning about kin term definitions,
there are a lot of methods and fields in this class. The class became so large that I
split it into 3 different files:
DT_Abstract1 is the basic
class. It is abstract, so it cannot be instantiated directly. It has many of the basic
fields.
DT_Abstract2 extends
DT_Abstract1 and inherits all its fields and methods.
DomainTheory is the
non-abstract class that can be instantiated. When I refer to this class, I will be
talking about all of its fields and methods including the inherited ones.
The principal fields of a domain theory, as a data structure, are these:
ctxt is a back-pointer to the enclosing
context.
There are fields to hold all of the attributes a domain theory may have. The field
names match the attribute names.
addressTerms is a boolean indicating
whether this domain theory concerns terms of reference or terms of address.
theory is a TreeMap holding the kin term
definitions in Horn Clause syntax.
dyadsUndefined and
dyadsDefined are both
DyadTMaps. See the discussion
below.
issuesForUser is a TreeMap<String,
ArrayList<Issue>> that stores potential Suggestions
(see below).
polygamyOK is a boolean that controls whether
SILKin will look for a concurrent spouse when searching for possible structures in a kin
term definition.
levelsOfRecursion is an int that controls how many levels
of recursion SILKin will expand when creating the expanded version of a definition.
Reference vs. Address
Most cultures use the same kin terms to refer to a kinsman and to address them. But a few
have different rules for addressing, so every context allows for up to two domain theories.
When the User is gathering data, a toggle in the 'Context' menu declares whether there are
separate terms of address in this culture.
When there are no separate terms of address (the default) then every time the User enters
the kin term that Ego calls Alter, SILKin automatically records that as the term Ego uses
to address Alter. Those default terms of address will appear in the Detail Display of the
SIL_Edit screen, but they are not editable.
If at some point the User discovers that separate terms of address exist, she can change
the toggle in the 'Context' menu. That will make the Address Terms field in the Detail
Display editable (so she can make retroactive changes) and henceforth she will need to
manually enter the terms of address as well as the terms of reference.
Because a context's address terms are a separate domain theory, anytime the User calls
for a learning session ('Get New Suggestions') SILKin will analyze both domain theories and
generate issuesForUser on each. In that case, when
she decides to 'Act on Suggestions' she must choose which set to work on.
Horn Clause Syntax
KinTermDef
The theory field of a DomainTheory holds a
TreeMap<String, KinTermDef> where the key is the kin term and the value is the
data structure that will define it in Horn Clause syntax, a
KinTermDef. An instance of a
KinTermDef (often referred to as a ktd) has these important fields:
domTh is a back pointer to the enclosing
domain theory.
clauseHead is a Literal that is printed
at the beginning of a Horn Clause. It is always 'the-kin-term(Alter, Ego)'.
kinTerm is a String. It could be
obtained from the clauseHead, but is stored separately for convenience.
definitions is an ArrayList<Object,>
whose elements are in fact ClauseBody instances.
expandedDefs is another ArrayList<Object,>
whose elements are ClauseBody instances. The difference is that the ClauseBodies in
definitions may contain any legal predicates, including
Standard_Macros
(e.g. 'brother') and kin terms from this culture that are defined elsewhere.
But the ClauseBodies in expandedDefs may only use
'primitive' predicates that are defined in SILKin: father, mother, son, daughter, etc.
(For more details, see Literal.)
The gloss field is initially empty until a session
is saved to the SILK file. Then it is generated and written out. The gloss is used only when
showing the User a potential definition for a currently undefined relationship during
'Act on Suggestions.'
The eqcSigExact and
eqcSigStruct fields hold strings. These are the
'signatures' of this definition: a concatenation of the PCStrings for the path between
Ego and Alter. The former uses gender-specific PCStrings while the latter uses gender-neutral
terms.
For example, the eqcSigExact for the English kin term 'uncle' is FaBro_FaSisHu_MoBro_MoSisHu.
It is composed of the kin types (PCStrings) of the four expandedDefs, in alphabetic order, separated by
underscores. The eqcSigStruct is PSib_PSibSp. This has only two elements instead of four because
it uses the gender-neutral symbol 'P' for parent, 'Sib' for sibling, and 'Sp' for spouse.
NOTE: At the end of my stewardship of the SILKin code,
glosses are involved in an unresolved issue. In English and many other languages, we express
a relationship verbally by saying 'Bob is John's father's sister's husband.' A gloss is
exactly that: 'father's sister's husband.' The PCString that is used in the code to represent
a kin type (e.g. 'FaSisHu') is structurally identical to the gloss in English.
When the User chooses to see menus and screens in
French, everything they see is in French, including
all suggestions, except for a gloss. In French they would say Bob is John's husband of the
sister of his father, or 'Hu de Sis de Fa.' Since Gary Simons is the author of the XSL files that
format suggestions into English or French, he must play a key role in re-formatting a gloss.
Unfortunately, the press of other duties has prevented him from doing so. Therefore, the
gloss shown to a French-speaking User is still in the English style.
This is not a major flaw since very few users pay any attention to suggestions. But the issue
should be resolved.
ClauseBody
The primary purpose of a KinTermDef is, of course, to define a kin term in Horn Clause syntax.
The definitions and
expandedDefs fields contain instances of the
ClauseBody class. As a data structure
a ClauseBody has these fields:
Four fields hold descriptors of this clause:
the level is an int showing how far above or below
Ego the Alter appears in a family tree chart. Ego's siblings are at level 0. His parents
are at level +1, his children at level -1, etc.
the pcCounter records how many parental
or child links are contained in this clause.
the sCounter records how many spousal
links are contained in this clause.
the starCounter records how many 'star links'
are contained in this clause. A star link is a chartable UDP that connects two people
(e.g. adoption).
ktd is a back-pointer to the enclosing KinTermDef.
pcString and
pcStringStructural are strings showing the kin type (both
strict and gender-neutral) of this clause.
the body is an ArrayList<Object> that
holds instances of the Literal class.
Literal
The Literal class models the components
of a clause. Just like DomainTheory, this class is the location of much of the action when
learning or reasoning about Horn Clauses and has many methods. So the code is broken into three files:
LiteralAbstract1 is the
base class, and is abstract. It is the physical location of a Literal's fields.
As a data structure, a Literal has only two significant fields:
the predicate field holds an instance of
Predicate. This is the 'operator' or 'function' that is applied
to the arguments. Predicates have a
category field that holds an instance of one of the
three subclasses of Predicate:
SILKin determines what type of predicate a literal uses by querying this field. This is useful
when expanding a kin term definition into an expandedDef. If a predicate is primitive
(i.e. one of the built-in predicates) then simply copy it. Otherwise, look up its defintion
and put the definition(s) into the expand definition(s).
args is an ArrayList<Object> that in fact
holds Arguments.
Argument
The Argument class is an abstract
class that is the superclass of anything that can be an argument to a Horn Clause. It has four
instantiable subclasses:
The Variable class represents
symbols (by convention, capital letters) that are variables in a Horn Clause. Their value is
determined when applying a clause to a specific set of individuals, traits or families.
The Constant class represents
symbols (numbers, strings, or lowercase letters) that are constants in a Horn Clause. That is,
their value is fixed in the Horn Clause, and cannot be changed. This class has a further
subclass: ArgString
The StackMarkerObj class
represents special objects that can be added to a stack of arguments. It is
used in the 'expand' methods to keep track of the number of levels of recursion that have
already been expanded.
The Literal class is also a subclass of Argument. Horn Clause syntax allows a Literal
to have a Literal as an argument. In SILKin, this is almost always used in negation. If we
wish to stipulate that person 'A' is alive, we would say 'not(dead(A))'; ('not' is a unary
predicate).
All subclasses of Argument inherit these fields:
argType is a string that declares what type
of argument this is. Each subclass has a static String that is its argType.
valueType is a string that declares what type
the value of this argument must be.
argName is a string that declares what the name
of this argument is in this particular literal.
value is an ArrayList<Object> that can hold
multiple values (in the rare cases where that makes sense, like a multi-valued UDP).
For example: Here is a Horn Clause (one of four for
the English term 'uncle'):
This entire Horn Clause is a ClauseBody, one of four in the
definitions field of the KinTermDef for 'uncle.' The literal 'uncle(Alter, Ego)' is
the headPred. There are two literals in the body
field's ArrayList. In the first literal, 'father' is the predicate (a primitive one) and the
two arguments are 'B' and 'Ego.' Both of those args begin with a capital letter, so they are
variables.
Dyads Defined/Undefined
As a user gathers data about the target culture, she records what a particular Ego calls
a particular Alter (and also possibly distinct address terms). This information is stored
directly on the Node for Alter that is part of Ego's row in the KinTermMatrix. When
anthropologists think about this data, they think in terms of dyads.
A dyad is defined as a triple: a pair of individuals and the kin term. When an anthropologist
has gathered enough dyads that all have the same kin term, she will try to figure out the
pattern and deduce the definition of that kin term.
SILKin creates instances of the Dyad
class as the user gathers data. If the kin term already has a definition in the appropriate
DomainTheory, the dyad will be stored on that domain theory's dyadsDefined
field which is a DyadTMap. If no
definition exists yet, the dyad will be stored on another DyadTMap,
dyadsUndefined.
As a data structure, a Dyad has these important fields:
the ego and
alter fields hold Individuals.
the kinTerm field holds a string, the kin term.
the pcString and
pcStringStructural fields hold strings that document
the series of relationships (predicates) along the path between this Ego and this Alter.
The pcStringStructural uses only gender-neutral symbols.
the challenged and
confirmed fields hold booleans with default value of
'false.' When SILKin is searching
for a definition that fits all the data User has collected for a particular kin term, sometimes
only one or two dyads don't fit an otherwise promising candidate definition. SILKin will
issue a suggestion to the User that perhaps these dyads are wrong, and put the value 'true'
in the 'challenged' field.
If the User confirms that a dyad is correct, SILKin will put 'true' in the 'confirmed' field
and never challenge that dyad again.
the path field is an ArrayList<Object>
that holds Individuals. This list contains all the 'linking kinsmen' between Ego and Alter.
A DyadTMap is a convenient
extension of the TreeMap class. It simply adds one method to the normal TreeMap methods:
dyAdd(a-dyad). The structure of the nested TreeMap in a
DyadTMap is TreeMap<String, TreeMap<String, ArrayList<Dyad>>>.
The top-level key is the kin term, and its value is the second-level TMap. The second-level
key is the pcString and the value is a list of all the dyads for this kin term that have
that pcString. The pcString, of course, is the index into the
ClauseIndex that plays a central
role in learning sessions.
Issues for Users (Suggestions)
Although users frequently ignore suggestions, the learning sessions that produce suggestions
are the academic focus of SILKin. The process for generating Suggestions
is discussed above. The data structures
for suggestions have the abstract superclass Issue.
All of its subclasses inherit two fields:
kinTerm is a string: the term SILKin was trying to
to learn when the issue arose.
processed is a boolean indicating whether the
User has acted on this suggestion yet.
There are seven subclasses that can be instantiated; one for each type of suggestion SILKin
might make.
An Anomaly is generated when a small
number of dyads seem to violate a candidate definition that is otherwise a good fit. It has
two significant fields:
basis is an ArrayList<Object> holding all
the KinTermDefs, ClauseBodies, or contradictions that triggered this Anomaly.
anomalyType is a string with one of three values:
falsePos
falseNeg
nearMiss
A ProposedDef is generated when
SILKin has found a definition in the Library of known kinship systems that covers all the kin
types seen so far for the kin term being learned (and perhaps a few more). It has
two significant fields:
ktd is the KinTermDef being proposed.
eqc is a
KTD_EQC. This is the
equivalence class of definitions that contains all the KinTermDefs from various languages
that cover the same set of kin types.
A ComposedDef is built by
induction when SILKin has failed to find any Library definitions that might be a good fit.
SILKin will attempt to compose a definition only if the 'Do Pure Induction' option is
selected in the 'Edit Prefs' screen. This is exceedingly rare. A ComposedDef has three
fields that document how it was built:
ktd is the KinTermDef being proposed.
languagesUsed is an ArrayList<Object>
that holds strings, the names of languages used in composing this definition. If pure
induction was used, the list will include the term "LGG_Induction".
Likewise, cbsUsed is an ArrayList<Object>
that holds strings identifying all ClauseBodies used in composing this definition. The
strings will be of the form 'languageName + ":" + kinTerm + ":" + cbSeqNmbr'.
A SynonymCandidate is generated when
SILKin has found dyads that seem to have the same definition, but they have two
different kin terms. There are too many dyads of each type for this to be a few errors.
A SynonymCandidate has four fields:
ktermDyads is an ArrayList<Object> that holds
all the dyads that use the 'primary' (more numerous) kin term.
otherTermDyads is an ArrayList<Object> that holds
all the dyads that use the 'secondary' (less numerous) kin term.
pairOfTerms is a string composed of the primary term,
a slash, and the secondary term.
pcStringsCovered is an ArrayList<Object>
that lists all the PCStrings covered by each of these kin terms.
An UmbrellaCandidate is generated when
SILKin has found two or more kin terms that have numerous dyads; i.e. they seem to have
solid definitions. But there is also a kin term that seems to cover all of these kin types.
In English, a good example is the term 'grandparent.' This term covers both grandfathers and
grandmothers; it is an umbrella term. An UmbrellaCandidate has only two fields:
subTerms is a TreeMap<String, ArrayList<Object>>.
The key is a string, a subTerm. The value is a 'supportList' that has 2 or 3 elements:
an ArrayList<Object> of pcStringscovered by this sub term.
an ArrayList<Object> of dyadscovered by this sub term.
optionally, an int: 1 or 2 (which of 2 thresholds were used)
pcStringsCovered is an ArrayList<Object>
that lists all the PCStrings covered by these kin terms.
An OverlapCandidate is generated when
SILKin has found two kin terms that seem to have solid definitions, but they both cover a particular
kin type. Neither one is an umbrella that covers all of the other. And they aren't synonyms.
In a few languages, these overlaps occur. As a data structure, an OverlapCandidate has four fields:
ktermDyads is an ArrayList<Object> that holds
all the dyads for the 'primary' kin term. (Recall that kinTerm
is an inherited field from Issue.)
otherTerm is a string, the other or 'secondary' kin
term.
otherTermDyads is an ArrayList<Object> that holds
all the dyads for the 'secondary' kin term.
intersection is an ArrayList<Object> that holds
the PCStrings for all kin types that are covered by both definitions.
A DataRequest is generated when
SILKin has found two or more kin term definitions in the Library that cover all the kin types
of the kin term we're trying to learn. But each of the candidates also covers some other kin
types. A data request identifies the most powerful new dyad(s) that can eliminate the most
candidates. If there are two or more persons already in this context that have the kin type
we are seeking, SILKin will ask the User to provide a kin term for that pair. Otherwise,
it will describe the kin type and ask User to find two such persons and record their
kin term. A DataRequest has only one field:
relatedCB_Ptrs is an ArrayList<Library.CB_Ptr>.
The Library.CB_Ptr class
implements a ClauseBody pointer (to be stored in CB_EQCs in the ClauseIndex).
It is used as a compact stand-in for a ClauseBody in set operations during learning.
Parsers
A parser is not a data structure, but the parsers in SILKin create most of the data
structures, so I felt they should be discussed in this section.
There are several open source parsers available that could have been configured to handle
the various file formats SILKin uses. But my first significant programming task in Java
was a class project: building a Java parser. Because I had a template and experience, I
decided to build a parser for human-prepared domain theory files, i.e. kinship systems that
I had created from my anthropological research. This allowed freedom, flexibility, and
precision that I found valuable, so I ended up hand-coding all parsers in SILKin.
My ParserDomainTheory and
ParserGEDCOM evolved slightly
over time, but were pretty stable over the entire project. My initial format for a SILK file
was strictly ad hoc, and my
ParserSILKFilePreXML
(originally just SILKParser) reflected that. Gary Simons suggested that making the SILK file
a proper XML format would be profitable, so the format was changed and a new
ParserSILKFile was written.
Thus, there are only three parsers you will ever need to deal with, and all three follow the
same template.
ParserDomainTheory and ParserSILKFile were considered foundational, so I prepared a
Context Free Grammar documenting each of them:
DomainTheoryGrammar and
SILKFileGrammar. These
grammars are not guaranteed to be up-to-date, but they are a good map to the parsers. The
names of the parsing methods correspond to the elements in the grammars. ParserSILKFile
is technically a subclass of ParserDomainTheory, but it turned out to not inherit much of
value from the superclass except the error() method.
ParserGEDCOM was created
in response to a user request for the ability to import and export files from other
genealogical software.
GEDCOM
is the only widely used interchange format, and it is rather
loose. It was developed by the Mormon church to enable members to upload their genealogies
into the Temple databases. Because the church wanted to accept files from as many sources
as possible, they allowed multiple ways to encode the same data. I tried to implement the
GEDCOM 5.5 standard. To my knowledge this capability has been seldom, if ever, used. There
is no grammar for this parser.
ParserSILKFile is
obviously used constantly. It had been stable for some time when a group of users complained
about the
list of prohibited characters for
kin terms. They wanted to use almost any
character in a kin term. This required minor amendments to the parser, but a major re-think
of the Deterministic Finite Automaton (DFA) that identifies legal tokens for the parser.
Documentation of the DFA class gives the
file formats for the Kin_DFA file found in the Library
folder.