uk.ac.essex.malexa.nlp.dp.GuiTAR
Class IOXMLUtils

java.lang.Object
  extended byuk.ac.essex.malexa.nlp.dp.GuiTAR.IOXMLUtils

public abstract class IOXMLUtils
extends Object

A wrapper class for hiding XML low level reading/writing.

Version:
1.1
Author:
Mijail A. Kabadjov

Field Summary
static String ANCHOR_TAG
           
static String ANTE_TAG
           
static String ANTECEDENT_PROPERTY_NAME
           
static String CATEGORY_PROPERTY_NAME
           
static String CURRENT_PROPERTY_NAME
           
static String DEPTHSCORE_CENTRE_PROPERTY_NAME
           
static String DEPTHSCORE_LEFT_PROPERTY_NAME
           
static String DEPTHSCORE_RIGHT_PROPERTY_NAME
           
static String DIV_TAG
           
static String ENCODING_LATIN1
           
static String ENCODING_UTF8
           
static String FEMENINE_PROPERTY_VALUE
           
static String FIRSTPERSON_PROPERTY_VALUE
           
static String GENDER_PROPERTY_NAME
           
static String ID_PROPERTY_NAME
           
static String IDENTITY_RELATION
           
static String IMPERSONAL_PROPERTY_VALUE
           
static String LEXEME_PROPERTY_NAME
           
static String MASCULINE_PROPERTY_VALUE
           
static String NEID_PROPERTY_VALUE_PREFIX
           
static String NEVEID_PROPERTY_NAME
           
static String NEVEID_PROPERTY_VALUE_PREFIX
           
static String NP_HEAD_TAG
           
static String NP_MODIFIER_TAG
           
static String NP_TAG
           
static String NUMBER_PROPERTY_NAME
           
static String PARAGRAPH_TAG
           
static String PARSER_SENTENCE_TAG
           
static String PERSON_PROPERTY_NAME
           
static String PLURAL_PROPERTY_VALUE
           
static String POS_PROPERTY_NAME
           
static String RELATION_PROPERTY_NAME
           
static String SECONDPERSON_PROPERTY_VALUE
           
static String SECTION_TAG
          PARAMETERS
static String SEGMENT_TAG
           
static String SENTENCE_TAG
           
static String SINGULAR_PROPERTY_VALUE
           
static String THIRDPERSON_PROPERTY_VALUE
           
static String TITLE_TAG
           
static String UNIT_TAG
           
static String UTTERANCEID_PROPERTY_VALUE_PREFIX
           
static String VE_TAG
           
static String VEID_PROPERTY_VALUE_PREFIX
           
static String WORD_TAG
           
 
Constructor Summary
IOXMLUtils()
           
 
Method Summary
static Map buildIdToNodeMap(Document doc, String tagName)
          Builds an id-to-node map for a given tag name for fast retrieval.
static String findLastVerb(Node node)
          Assuming that verbal constructions must be horizontally represented, in terms of XML, in the GNOME Corpus, this method goes on traversing the siblings of node (parameter) until the last one is reached.
static Element getElementById(Document doc, String elementId)
          This method returns a DOM tree node specified by its id, eg.
static Element getElementById(String elementId, Map m)
          This is a more efficient implementation of the method getElementById( Document, String ), as it makes use of a pre-builded hash table (hashmap).
static Vector getElementsByTagNameNoEmbedding(Node node, Set tagsIgnored, String tagSearched)
          A recursive method similar to getElementsByTagName from the DOM interface, which in contrast skips nodes with names given by the Set tagsIgnored.
static Vector getElementsByTagNameNoEmbedding(Node node, String tagIgnored, String tagSearched)
          A recursive method similar to getElementsByTagName from the DOM interface, which in contrast skips nodes with name given by the parameter tagIgnored.
static String getLeftCollocate(Node node, Node uttNode)
           
static String getLeftCollocate(NominalGroup ne)
          Code from PersonalPronoun
static Node getNextNodeByTagName(Node node, String tagName, Node tree)
          This method returns the following node after a given node (first parameter) in a preorder traversal of the DOM tree, having that its tag name is equal to the second parameter of this method.
static int getNumberOfIntermediateNodes(Node node1, Node node2, String tagName)
          A method that returns the number of intermediate nodes with tag name given as a parameter between node1 and node2 (in the DOM tree).
static Vector getNumberOfMatchingNodes(Node node1, Node node2, String tagName)
          A recursive method that returns the number of nodes with tag name given as a parameter within the tree represented by parameter node1 and before node2 if present in the latter tree.
static Node getParentNodeByName(Node node, Node topNode, String tagName)
          Returns the first parent node with the specified tag name
static String getPhraseString(Node cf)
           
static Node getPreviousNodeByTagName(Node node, String tagName, Node tree)
          This method returns the preceeding node of a given node (first parameter) in a preorder traversal of the DOM tree, having that its tag name is equal to the second parameter of this method.
static String getRightCollocate(Node node, Node uttNode)
           
static String getRightCollocate(NominalGroup ne)
           
static boolean isBeforeNode(Node node1, Node node2, Node parent)
          Performs a preorder traversal of the tree represented by the node parent, starting at node node1 and continues until either node2 is found or the end of the tree has been reached.
static boolean isElementOfNP(Node node, Node upmostNode)
          Tests whether the node, given as a parameter, is part of an NP.
static boolean isNodeInTree(Node node, Node tree)
          Checks whether node (first parameter) is part of the tree represented by the second parameter.
static Document load(String inputFileName)
          Loads an XML file and parses it into a DOM Tree.
static String processWord(Node word)
          Checks whether the word node, given as a parameter, contains a verb and if so this verb is returned.
static void save(Document xmlDocument, String outputFileName)
          Stores an XML document (DOM Tree) to a file assuming and encoding of UTF8.
static void save(Document xmlDocument, String outputFileName, String encoding)
          Stores an XML document (DOM Tree) to a file.
static Vector toStringsVector(Vector wordNodes)
          Converts a Vector of word Nodes to a Vector of Strings (all characters are lowercase).
static Vector toStringsVectorKeepLetterCase(Vector wordNodes)
          Converts a Vector of word Nodes to a Vector of Strings.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SECTION_TAG

public static final String SECTION_TAG
PARAMETERS

See Also:
Constant Field Values

UNIT_TAG

public static final String UNIT_TAG
See Also:
Constant Field Values

SEGMENT_TAG

public static final String SEGMENT_TAG
See Also:
Constant Field Values

DIV_TAG

public static final String DIV_TAG
See Also:
Constant Field Values

PARAGRAPH_TAG

public static final String PARAGRAPH_TAG
See Also:
Constant Field Values

SENTENCE_TAG

public static final String SENTENCE_TAG
See Also:
Constant Field Values

PARSER_SENTENCE_TAG

public static final String PARSER_SENTENCE_TAG
See Also:
Constant Field Values

ANTE_TAG

public static final String ANTE_TAG
See Also:
Constant Field Values

ANCHOR_TAG

public static final String ANCHOR_TAG
See Also:
Constant Field Values

NP_TAG

public static final String NP_TAG
See Also:
Constant Field Values

VE_TAG

public static final String VE_TAG
See Also:
Constant Field Values

WORD_TAG

public static final String WORD_TAG
See Also:
Constant Field Values

NP_HEAD_TAG

public static final String NP_HEAD_TAG
See Also:
Constant Field Values

NP_MODIFIER_TAG

public static final String NP_MODIFIER_TAG
See Also:
Constant Field Values

TITLE_TAG

public static final String TITLE_TAG
See Also:
Constant Field Values

POS_PROPERTY_NAME

public static final String POS_PROPERTY_NAME
See Also:
Constant Field Values

LEXEME_PROPERTY_NAME

public static final String LEXEME_PROPERTY_NAME
See Also:
Constant Field Values

ID_PROPERTY_NAME

public static final String ID_PROPERTY_NAME
See Also:
Constant Field Values

NEVEID_PROPERTY_NAME

public static final String NEVEID_PROPERTY_NAME
See Also:
Constant Field Values

CURRENT_PROPERTY_NAME

public static final String CURRENT_PROPERTY_NAME
See Also:
Constant Field Values

RELATION_PROPERTY_NAME

public static final String RELATION_PROPERTY_NAME
See Also:
Constant Field Values

ANTECEDENT_PROPERTY_NAME

public static final String ANTECEDENT_PROPERTY_NAME
See Also:
Constant Field Values

CATEGORY_PROPERTY_NAME

public static final String CATEGORY_PROPERTY_NAME
See Also:
Constant Field Values

PERSON_PROPERTY_NAME

public static final String PERSON_PROPERTY_NAME
See Also:
Constant Field Values

NUMBER_PROPERTY_NAME

public static final String NUMBER_PROPERTY_NAME
See Also:
Constant Field Values

GENDER_PROPERTY_NAME

public static final String GENDER_PROPERTY_NAME
See Also:
Constant Field Values

DEPTHSCORE_LEFT_PROPERTY_NAME

public static final String DEPTHSCORE_LEFT_PROPERTY_NAME
See Also:
Constant Field Values

DEPTHSCORE_CENTRE_PROPERTY_NAME

public static final String DEPTHSCORE_CENTRE_PROPERTY_NAME
See Also:
Constant Field Values

DEPTHSCORE_RIGHT_PROPERTY_NAME

public static final String DEPTHSCORE_RIGHT_PROPERTY_NAME
See Also:
Constant Field Values

IDENTITY_RELATION

public static final String IDENTITY_RELATION
See Also:
Constant Field Values

UTTERANCEID_PROPERTY_VALUE_PREFIX

public static final String UTTERANCEID_PROPERTY_VALUE_PREFIX
See Also:
Constant Field Values

NEID_PROPERTY_VALUE_PREFIX

public static final String NEID_PROPERTY_VALUE_PREFIX
See Also:
Constant Field Values

VEID_PROPERTY_VALUE_PREFIX

public static final String VEID_PROPERTY_VALUE_PREFIX
See Also:
Constant Field Values

NEVEID_PROPERTY_VALUE_PREFIX

public static final String NEVEID_PROPERTY_VALUE_PREFIX
See Also:
Constant Field Values

FIRSTPERSON_PROPERTY_VALUE

public static final String FIRSTPERSON_PROPERTY_VALUE
See Also:
Constant Field Values

SECONDPERSON_PROPERTY_VALUE

public static final String SECONDPERSON_PROPERTY_VALUE
See Also:
Constant Field Values

THIRDPERSON_PROPERTY_VALUE

public static final String THIRDPERSON_PROPERTY_VALUE
See Also:
Constant Field Values

SINGULAR_PROPERTY_VALUE

public static final String SINGULAR_PROPERTY_VALUE
See Also:
Constant Field Values

PLURAL_PROPERTY_VALUE

public static final String PLURAL_PROPERTY_VALUE
See Also:
Constant Field Values

FEMENINE_PROPERTY_VALUE

public static final String FEMENINE_PROPERTY_VALUE
See Also:
Constant Field Values

MASCULINE_PROPERTY_VALUE

public static final String MASCULINE_PROPERTY_VALUE
See Also:
Constant Field Values

IMPERSONAL_PROPERTY_VALUE

public static final String IMPERSONAL_PROPERTY_VALUE
See Also:
Constant Field Values

ENCODING_UTF8

public static final String ENCODING_UTF8
See Also:
Constant Field Values

ENCODING_LATIN1

public static final String ENCODING_LATIN1
See Also:
Constant Field Values
Constructor Detail

IOXMLUtils

public IOXMLUtils()
Method Detail

load

public static Document load(String inputFileName)
Loads an XML file and parses it into a DOM Tree.

Parameters:
inputFileName - The name of the XML file.

save

public static void save(Document xmlDocument,
                        String outputFileName)
Stores an XML document (DOM Tree) to a file assuming and encoding of UTF8.

Parameters:
xmlDocument - The XML document to be stored
outputFileName - The name of the new XML file

save

public static void save(Document xmlDocument,
                        String outputFileName,
                        String encoding)
Stores an XML document (DOM Tree) to a file.

Parameters:
xmlDocument - The XML document to be stored
outputFileName - The name of the new XML file
encoding - The encoding to be used (i.e. UTF8, LATIN-1, etc.)

getElementsByTagNameNoEmbedding

public static Vector getElementsByTagNameNoEmbedding(Node node,
                                                     String tagIgnored,
                                                     String tagSearched)
A recursive method similar to getElementsByTagName from the DOM interface, which in contrast skips nodes with name given by the parameter tagIgnored.

Parameters:
node - the node representing the tree to be searched
tagIgnored - the tag name of the nodes to be skipped
tagSearched - the tag name of the nodes to be returned
Returns:
Vector A Vector of all the child nodes of node with name tagSearched, which are not children of nodes with name tagIgnored

getElementsByTagNameNoEmbedding

public static Vector getElementsByTagNameNoEmbedding(Node node,
                                                     Set tagsIgnored,
                                                     String tagSearched)
A recursive method similar to getElementsByTagName from the DOM interface, which in contrast skips nodes with names given by the Set tagsIgnored.

Parameters:
node - the node representing the tree to be searched
tagsIgnored - the tag names of the nodes to be skipped
tagSearched - the tag name of the nodes to be returned
Returns:
Vector A Vector of all the child nodes of node with name tagSearched, which are not children of nodes with name tagIgnored

getPreviousNodeByTagName

public static Node getPreviousNodeByTagName(Node node,
                                            String tagName,
                                            Node tree)
This method returns the preceeding node of a given node (first parameter) in a preorder traversal of the DOM tree, having that its tag name is equal to the second parameter of this method. The third parameter specifies whether to search only the tree represented by the parent of the reference node (tree points to non-null reference), or the whole DOM tree (tree is null). (This method is a variance of the getPreviousSibling() method in the DOM API in the sense that it retrieves the previous sibling by tag name).

Parameters:
node - The reference node
tagName - The name of the tags to be considered
tree - The node delimiting the scope of the search
Returns:
Node The node preceeding the reference node in a preorder traversal

getNextNodeByTagName

public static Node getNextNodeByTagName(Node node,
                                        String tagName,
                                        Node tree)
This method returns the following node after a given node (first parameter) in a preorder traversal of the DOM tree, having that its tag name is equal to the second parameter of this method. The third parameter specifies whether to search only the tree represented by the parent of the reference node (tree points to non-null reference), or the whole DOM tree (tree is null). (This method is a variance of the getNextSibling() method in the DOM API in the sense that it retrieves the next sibling by tag name).

Parameters:
node - The reference node
tagName - The name of the tags to be considered
tree - The node delimiting the scope of the search
Returns:
Node The node preceeding the reference node in a preorder traversal

getElementById

public static Element getElementById(Document doc,
                                     String elementId)
This method returns a DOM tree node specified by its id, eg. id="ne233".

Parameters:
doc - The root of the DOM tree
elementId - The id of the element to look for
Returns:
Element The element whose attribute id is equal to the given id

getElementById

public static Element getElementById(String elementId,
                                     Map m)
This is a more efficient implementation of the method getElementById( Document, String ), as it makes use of a pre-builded hash table (hashmap).

Parameters:
elementId - The id of the element to look for
m - The map (hash table) to be searched through
Returns:
Element The element whose attribute id is equal to the given id

buildIdToNodeMap

public static Map buildIdToNodeMap(Document doc,
                                   String tagName)
Builds an id-to-node map for a given tag name for fast retrieval.

Parameters:
doc - The root of the DOM tree
Returns:
HashMap The resulting Map

toStringsVector

public static Vector toStringsVector(Vector wordNodes)
Converts a Vector of word Nodes to a Vector of Strings (all characters are lowercase).

Parameters:
wordNodes - The Vector of word Nodes
Returns:
Vector The vector of Strings

toStringsVectorKeepLetterCase

public static Vector toStringsVectorKeepLetterCase(Vector wordNodes)
Converts a Vector of word Nodes to a Vector of Strings. The original (lower, upper) case is kept.

Parameters:
wordNodes - The Vector of word Nodes
Returns:
Vector The vector of Strings

getPhraseString

public static String getPhraseString(Node cf)

getNumberOfMatchingNodes

public static Vector getNumberOfMatchingNodes(Node node1,
                                              Node node2,
                                              String tagName)
A recursive method that returns the number of nodes with tag name given as a parameter within the tree represented by parameter node1 and before node2 if present in the latter tree.

Parameters:
node1 - The node where the counting starts
node2 - The node where the counting ends
tagName - The name of the tags to be accounted for
Returns:
Vector A Vector of two elements: 1- true if node2 is a child of node1 (false otherwise) 2- number of nodes with name=tagName

getNumberOfIntermediateNodes

public static int getNumberOfIntermediateNodes(Node node1,
                                               Node node2,
                                               String tagName)
A method that returns the number of intermediate nodes with tag name given as a parameter between node1 and node2 (in the DOM tree).

Parameters:
node1 - The node where the counting starts
node2 - The node where the counting ends
tagName - The name of the tags to be accounted for
Returns:
int Number of intermediate nodes

isBeforeNode

public static boolean isBeforeNode(Node node1,
                                   Node node2,
                                   Node parent)
Performs a preorder traversal of the tree represented by the node parent, starting at node node1 and continues until either node2 is found or the end of the tree has been reached. Returns true if node2 id found, false otherwise. The method assumes that neither node2 is part of the tree represented by node1, nor node1 is part of the tree represented by node2.

Parameters:
node1 - The first node
node2 - The second node
parent - The parent node
Returns:
boolean True if node1 before node2 in tree represented by parent, false otherwise

isNodeInTree

public static boolean isNodeInTree(Node node,
                                   Node tree)
Checks whether node (first parameter) is part of the tree represented by the second parameter.

Parameters:
node - The node to be searched for
tree - The tree defining the scope of the search
Returns:
boolean True if node "node" is part of node "tree", false otherwise

getLeftCollocate

public static String getLeftCollocate(NominalGroup ne)
Code from PersonalPronoun


getLeftCollocate

public static String getLeftCollocate(Node node,
                                      Node uttNode)

processWord

public static String processWord(Node word)
Checks whether the word node, given as a parameter, contains a verb and if so this verb is returned.

Parameters:
word - The word to be processed
Returns:
String the verb if the word is a verb, null otherwise

isElementOfNP

public static boolean isElementOfNP(Node node,
                                    Node upmostNode)
Tests whether the node, given as a parameter, is part of an NP. Starts from node and goes up the DOM tree until either an NP node is found or upmostNode is reached.

Parameters:
node - The starting node of the search
upmostNode - The node, beyond which no search is done
Returns:
boolean True if node is part of an NP, false otherwise

getRightCollocate

public static String getRightCollocate(NominalGroup ne)

getRightCollocate

public static String getRightCollocate(Node node,
                                       Node uttNode)

findLastVerb

public static String findLastVerb(Node node)
Assuming that verbal constructions must be horizontally represented, in terms of XML, in the GNOME Corpus, this method goes on traversing the siblings of node (parameter) until the last one is reached. On the way any word whose part-of-speech is a verb is stored, hence at the end the rightmost verb, of a possibly more complex verbal construction, is stored and returned.

Parameters:
node - The node holding the first verb of a (possibly) more complex construction
Returns:
String The rightmost verb

getParentNodeByName

public static Node getParentNodeByName(Node node,
                                       Node topNode,
                                       String tagName)
Returns the first parent node with the specified tag name

Parameters:
node - The node
topNode - The node beyond which should not be searched
Returns:
Node The parent node with the specified tag name or if no such node the topNode is returned