next up previous contents
Next: Stages of Translation Up: Squirrel's Grammar and Previous: The Lexicon

Compiling the grammar and lexicon


To enable the parser to use term unification, the grammar and lexicon must be compiled into a different representation.

If, for example, we define an agreement feature that contains number and case then to check if two agreement attributes match, we must try to unify the the two graphs using graph unification. Graph unification is not very efficient.

However, the important informtation that we are really interested in is contained in the terminal nodes of the graphs. If we exand all the possible paths in the attributes so that all equations only are expressed in terms of the terminal nodes, then the parser can use the more efficient term unification. This is what the Squirrel system does.

In order to compile out all the possible paths in the attributes of categories, we must know what those possible paths are. This is the purpose of the gramdef file. It contains information about the (first-level) attributes that a syntactic category may have, for example:

category_structure( s,  [mood,npslash] ).  
category_structure( np, [gender,agreement,case,quest,npslash] ).  
category_structure( vp, [agreement,vform,neg,quest, 
                         npslash,vslash,aslash,modal] ).
and what other attributes or values each of these attributes may have. If an attribute is declared to be a feature then it contains other attributes:
domain( agreement, feature, [num,pers] ).
It an attribute is declared to be atomic then it can contain an atomic value and not other attributes:
domain( num, atomic, [sing,plur] ).  
domain( pers, atomic, [1,2,3] ).

In addition, the gramdef file contains information about the semantic types of the categories. For example, the representation of a sentence is something like a proposition, whereas a noun is a property:

type_def( s, p ).  
type_def( n, pty ).
These indicate the type of the semantic annotations on lexical and grammatical entries. As the semantics are given in property theory, a category's semantic annotation can belong to more than one type.

Chris Fox, September 1995