In this page you can find information about the resources we distribute
- ARRAU: This is a corpus with a very rich anaphoric and semantic information (including information about genericity, reference to abstract objects, etc) and containing English texts from very different genres (the RST Discourse Treebank, It will soon become available through LDC and Anaphoric Bank, for the moment just write me.
- DECOUR: This is a corpus of transcripts of witness statements in Italian courts, used in our work on deception detection. Soon to be made available.
- GNOME: This is a corpus of English texts from three domains (museum descriptions, pharmaceutical leaflets, and the Sherlock corpus of tutorial dialogues) with very rich anaphoric and semantic information used to study salience and its effect on referring expression generation. The site gives information about the corpus, ask me to have it.
- LiveMemories-Anaphora: A corpus of Italian texts annotated with a very similar scheme as ARRAU. Two domains: Wikipedia and blogs. Again, ask me.