Language and Computation Seminar, 2005/06:
Empirical Methods in NLP
This year's seminar is about how to design an experiment
both in general and with specific application to NLP, how to test an
hypothesis and, more in general, how to evaluate an NLP system. We will try to
use examples from anaphora resolution, but we will also read experimental work
from other areas of NLP.
The most novel feature of this year's seminar is that this
is going to be an AUDIENCE PARTICIPATION SEMINAR, meaning that the participants
(you) are all expected to present some material; Massimo will only do few
presentations .... so have a look at the topics identified below and decide what
you'd like to read.
This term, the seminar will meet in the Colloquium Room in
Computer Science (5A.540, next to Massimo's office), Tuesdays, 11-12:45.
This page:
http://cswww.essex.ac.uk/LAC/LAC_empirical_seminar_syllabus.html
- Primary Text:
- Paul Cohen, Empirical Methods in AI,
MIT Press, 1995
- Supplementary Readings I: Statistics
- Woods, Fletcher, and Hughes, Statistics in
Language Studies. Cambridge.
- R. Kirk, Experimental Design, Brooks / Cole
- Experimental design (in psychology): a first
introduction
- October 11th, Sonja
- Readings:
- Sonja's handout
- Cohen, ch. 3
- Kirk, chapter 1
- Hypothesis testing: a first introduction
- October 18th / 25th, Ron (October 25th: also Roman Tesar,
text classification using n-grams)
- Experimental design II: Latin Square design
- November 1st: Udo's JASIST paper
- November 8th: Richard's TREC paper
Hypothesis testing II: the t-test and its applications
- November 15th: t-test (Mijail)
- Dietterich, 1998
- For a more basic intro, see Cohen ch. 4 / Woods and Hughes ch
8
- November 22nd: use of t-test to compare the performance
of anaphora resolution systems:
- Evaluation in NLP & Anaphora Resolution (November
29th)
- Evaluation in MUC
- Evaluation in Anaphora Resolution:
- Evaluation in Information Extraction:
- Hypothesis testing III: Computer-intensive methods
(December 6th)
- General motivation (for which kinds of population
parameters you can't use the t-test?)
- Readings: Cohen, ch. 5 (handouts still available from
Massimo)
- Further readings:
- The standard reading is Noreen, E. W., (1989), Computer
intensive methods for testing hypotheses, John Wiley and Sons (but our library
doesn't have it)
- Alternatives: Manly, B. F., Randomization and Monte-Carlo
methods in biology, Chapman and Hall, 1991
- Edgington, E. S., Statistical inference: the
distribution-free approach, McGraw-HIll, 1969.
- Experimental design, IV: Power calculations
(December 13th)
- December 13th: Nancy
- Main readings: Cohen, ch. 4
- Further reading: R. Kirk, ch. 1 (t-test)
- January 10th: Riccardo Russo
- Further reading suggested by Riccardo:
- January 17th: An example of power calculations -
dual models vs connectionist explanations of morphology (Sonja)
- Hypothesis Testing IV: ANOVA
- January 24th, Theoretical introduction (Ron)
- Readings: Cohen ch. 7? Woods-Hughes ch. 12? Kirk
chapter 5?
- ANOVA in psychology:
- The Poesio et al 2001
paper on underspecification in Anaphora?
- The Spivey et al paper?
- ANOVA in NLP: examples
- Experimental design IV: examples of good practice in experimental design in AR & NLP
- (Ron?) Frank Keller and Mirella Lapata,
Using the Web to obtain Frequencies for Unseen Bigrams, Computational
Linguistics v. 29, n. 3, 2003
- (Olivia?) Dan Gildea and Dan Jurafsky,
Automatic labelling of
semantic roles, Computational Linguistics, 2002
- (Mijail) One for the papers by
Veronique Hoste
from Walter Daelemans' group - for instance,
Comparing
Learning Approaches to coreference resolution
- (Mijail) Kehler et al,
The
non-utility of predicate-argument frequencies for pronoun interpretation,
NAACL 2004
- Also possible: Lapata CL 2002
- Hypothesis testing V: Chi-square
- Basic intro: Olivia?
- Readings: Woods and Hughes ch 9?
- Applications to anaphora / NLP
- Readings: Poesio to appear??
- An alternative to Chi-square: log-likelihood (Dunning
CL 1993)
- Experimental design, III: Sample design
- General intro
- Corpora used in AR / NLP
- Hypothesis Testing VI: Other distributions
- Binomial & the sign test
- Poisson
- Additional forms of performance assessment
- Learning curves (Richard? Mijail?)
- Readings: Cohen ch. 6
- Maybe also go back to Cohen ch. 2 (advanced
visualization)
- Analysis of a decision tree
- Feature selection
- More Hypothesis Testing:
- Linear regression
- Readings: Woods-Hughes ch. 13?
- Logistic regression
- Magnitude estimation
- Improving the performance of ML systems
-
Courses
Projects
Other useful Web links: