Massimo Poesio

Text mining and classification

Much of my research, in particular of my funded research, is concerned with the application of CL methods to practical problems, including detecting deceptive language, managing big data either in digital libraries or in social media, and other areas. Many of these projects are collaborations with Udo Kruschwitz.

Deception detection using stylometric methods

Methods for discovering whether assertions are reliable or deceptive could have a variety of applications, e.g., in court, or to assess the reliability of online reviews. In collaboration with my former PhD student Tommaso Fornaciari I have studied the application of stylometric techniques--techniques that use statistics about the occurrence of function words in order to determine psychological traits of the author of a text--to evaluate the reliability of witness statements in court (Fornaciari and Poesio, 2013) and of Amazon reviews (Fornaciari and Poesio, 2014). In collaboration with another former PhD student, Fabio Celli, we have also explored the use of personality identification techniques for this task (Fornaciari et al, 2013).

Mining social media

Social media are a very rich source of information of different kinds. Many of my current text mining projects are concerned with extracting information from social media.

One example of useful application of these techniques is the recently started KTP project with Minority Rights Group on mining Arabic social media to identify possible reports of human rights abuse.

A second application is an ongoing collaboration with BT on using event extraction methods for predictive purposes-e.g., to predict future traffic jams.

Text mining in the Digital Humanities

Another rich source of textual data is represented by digital libraries. In the GALATEAS project we applied information extraction techniques to analyze query logs. In an ongoing collaboration with the Bagolini Archaeological Lab from the University of Trento, we have been developing NER techniques to facilitate the upload and search of scholarly articles in Archaeology. More recently, we have in particular focused on the use of active learning methods to imnprove the quality of our mining methods.

Projects (in inverse chronological order)

  • Improving Reporting of Human Right Abuses through Arabic Social Media (2014-17), a KTP collaboration between Essex and Minority Rights Group funded by Innovate UK.
  • SENSEI (2013-16), a EU project on using discourse to summarize online conversations.
  • GALATEAS (2009-12), a EU project on using text mining to analyze query logs.
  • LiveMemories (2006-2010), a large project on using text mining to support creation of shared knowledge funded by the Provincia di Trento.

Main publications

  • Francesca Bonin, Asif Ekbal, Sriparna Saha, Utpal Sikdar, Fabio Cavulli, Aronne Noriller, Ans Alghamdi, and Massimo Poesio. Named Entity Recognition in the Humanities: The Case of the Archaeology Domain. Submitted.
  • Asif Ekbal, Francesca Bonin, Sriparna Saha, Utpal Sikdar, Fabio Cavulli, Ans Alghamdi, and Massimo Poesio. Active Learning and Ensemble Construction for NER in the Archeological Domain. Submitted.
  • Maha Althobaiti, Udo Kruschwitz, and Massimo Poesio, 2015. Combining Minimally Supervised Methods for Arabic Named Entity Recognition. Transactions of the ACL.. (pdf)
  • Alghamdi,A., F.Bonin, A.Ekbal, S.Saha, F.Cavulli, S.Tonelli, M.Poesio,and U.Kruschwitz, 2014. Active Expert Learning for the Digital Humanities. In Proceedings of STRIX, Gothenburg, November.
  • Fornaciari, T. and M. Poesio, 2014. Identifying fake Amazon reviews as learning from crowds. Proc. of EACL, Gothenburgh, April.
  • Fornaciari, T., F. Celli, and M. Poesio, 2013. The Effect of Personality Type on Deceptive Communication Style. In Proc. of FORTAN, Uppsala, August.
  • Lungley, D., M. Poesio, M. Trevisan, M. Althobaiti and V. Nguyen, 2013. GALATEAS D2W: A Multi-lingual Disambiguation to Wikipedia Web Service. In Proc. of ENRICH, Dublin, August.
  • Tommaso Fornaciari and Massimo Poesio, 2013. Automatic deception detection in Italian court cases. Journal of AI and Law. 21(3), 303--340. (pdf)
    • This paper was discussed in the Wall Street Journal here
  • Nguyen, T.-V. and M. Poesio, 2012. Entity disambiguation and linking over queries using encyclopedic knowledge. In Proc. of the Sixth Workshop on Analytics for Noisy Unstructured Text Data (AND 2012, in conjunction with COLING 2012), Mumbai, India.
  • Bonin, F., F. Cavulli, M. Poesio, and E. W. Stemle, 2012. Annotating Archaeological Texts: An Example of Domain-Specific Annotation in the Humanities. In Proc. of the Sixth Linguistic Annotation Workshop (LAW) at ACL 2012, Jeiu, Korea, p. 134-138. (pdf)
  • Fornaciari, T. and M. Poesio, 2012. DECOUR: A corpus of Deceptive Statements in Italian Courts. In Proc. of LREC, Istanbul.
  • Fornaciari, T. and M. Poesio, 2012. On the Use of Homogenous Sets of Subjects in Deceptive Language Analysis. In Proc. of EACL Workshop on Computational Approaches to Deception Detection, Avignon. (pdf)
  • Asif Ekbal, Francesca Bonin, Sriparna Saha, Egon Stemle, Eduard Barbu, Fabio Cavulli, Christian Girardi, and Massimo Poesio, 2011. Rapid Adaptation of NE Resolvers for Humanities Domains using Active Annotation. Journal for Language Technology and Computational Linguistics, 26(20), 39–51.
  • Josef Steinberger, Massimo Poesio, Mijail Kabadjov and Karel Jezek 2007. Two uses of anaphora resolution in summarization. Information Processing and Management, v. 43, n. 6, 1663-1680. Special issue on Summarization (Donna Harman, ed.). (pdf of preliminary version)
  • Olivia Sanchez-Graillet and Massimo Poesio, 2007. Negation of protein-protein interactions. Bioinformatics, v. 23, n. 13, 424-432. (Full content of article in .html)