Next: Evaluation of Anaphora Resolution
Up: The Anaphora-Resolution Module
Previous: The Anaphora-Resolution Module
In evaluating the algorithm for anaphora resolution12, we looked at pronominal anaphora resolution in
Spanish and English, respectively. For the Spanish evaluation,
the method was tested on the portion of the LEXESP corpus
previously used to evaluate zero-pronoun detection and
resolution. For English, we tested the method on two kinds of
corpora. In the first instance, we used a portion of the SemCor
collection--presented in [Landes et al., 1998]--which contains a set
of eleven documents (23,788 words) in which all content words are
annotated with the most appropriate WordNet sense. The SemCor
corpus contains texts about different topics (law, sports,
religion, nature, etc.) and was written by different authors. In
the second instance, the method was tested on a portion of the
MTI13 corpus, which contains seven
documents (101,843 words). The MTI corpus contains computer
science manuals on different topics (commercial applications,
word processing applications, device instructions, etc.). Both
English corpora are automatically tagged by different taggers.
We randomly selected a subset of the SemCor corpus (three
documents--6,473 words) and another subset of the MTI corpus
(two documents--24,264 words) as training corpus. The
remaining fragments of the corpora were reserved for test data.
In the two tasks, the training phase was used to identify the
importance of each kind of knowledge to obtain the optimal order
of the preferences.
Next: Evaluation of Anaphora Resolution
Up: The Anaphora-Resolution Module
Previous: The Anaphora-Resolution Module
Jesus Peral
2002-12-13