Here are some supplementary readings to accompany various weeks.
| Week |
Topic |
Article |
| 2: 3rd March |
NLTK on tokenisation |
http://nltk.org/doc/en/words.html |
| |
Porter Stemmer description |
http://www.tartarus.org/~martin/PorterStemmer/def.txt |
| |
Porter Stemmer (Python) |
http://www.tartarus.org/~martin/PorterStemmer/python.txt |
| 3: 10th March |
Statistics |
B. Krenn & C. Samuelsson (1997) The Linguist�s Guide to Statistics: Don�t Panic, Chapter 3 [local copy] |
| |
Statistics |
HyperStat Online Statistics Textbook (note that tests of proportions uses a slightly different formula from lecture notes) |
| 4: 17th March |
Text Classification Methods |
Yiming Yang, Xin Liu (1999). A Re-Examination of Text Categorization Methods. [local copy] |
| |
CMU Classifcation Data Collection |
http://www-2.cs.cmu.edu/~wcohen/ |
| |
WebKB Data |
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/ |
| 5: 24th March |
Feature Selection in Classification |
Yang and Pedersen (1997).A Comparative Study on Feature Selection in Text Categorization [local copy] |
| |
Feature Engineering in Text Classification |
Scott & Matwin (1999). Feature Engineering in Text Classification [local copy] |
| |
Bayesian Spam Filtering |
M. Sahami, S. Dumais, D. Heckerman, E. Horvitz (1998). A Bayesian approach to filtering junk e-mail |
| |
Paul Graham's "A Plan for Spam" |
http://www.paulgraham.com/spam.html |
| |
Paul Graham's "Better Bayesian Filtering" |
http://www.paulgraham.com/better.html |
| |
Sentiment Classification (Turney) |
Turney, P.D. (2002), Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews |
| |
Sentiment Classification (Pang and Lee) |
Bo Pang and Lillian Lee (2004), A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts |
| |
Blog Profiling |
J. Schler, M. Koppel, S. Argamon and J. Pennebaker (2006). Effects of Age and Gender on Blogging |
| |
Email Profiling |
Dominique Estival, Tanja Gaustad, Ben Hutchinson, Son Bao Pham and Will Radford (2007). Author Profiling for English Emails. |
| |
SVM Light |
http://svmlight.joachims.org/ |
| 7: 7th April |
Part of Speech Tagging |
E. Brill (1992) A Simple Rule-Based Part of Speech Tagger. In Proceedings of the Third Conference on
Applied Natural Language Processing, Trento, Italy. |
| |
Part of Speech Tagging |
NLTK chapter 4 "Categorizing and Tagging Words" [HTML] [PDF] |
| |
Grammars and Parsing |
NLTK chapter 8 "Context Free Grammars and Parsing" [HTML] [PDF] |
| |
Chart Parsing |
NLTK chapter 9 "Chart Parsing and Probabilistic Parsing" [HTML] [PDF] |
| 8: 28th April |
Word Sense Disambiguation |
Ide & Veronis (1998) Word Sense Disambiguation: The State of the Art |
| 9: 5th May |
PageRank |
What is PageRank? |
| |
Gaph Theory for NLP |
Mihalcea & Radev (2006) Graph-based Algorithms for Information Retrieval and Natural Language Processing |
| 10: 12th May |
Document Summarisation |
E. H. Hovy, Automated Text Summarization, In: R. Mitkov (ed), The Oxford Handbook of Computational Linguistics, pp. 583-598, 2003. See Blackboard's private resources. |
| |
Information Extraction |
D. Appelt and D. Israel. Introduction to Information Extraction Technology (1999) |
| |
Information Extraction |
Hobbs et al. FASTUS (1997) |
| |
Information Extraction |
R. Grisham, Information Extraction, In: R. Mitkov (ed), The Oxford Handbook of Computational Linguistics, pp. 545-559, 2003. In the Library.
|
| |
Named Entity Recognition |
David Nadeau, Satoshi Sekine. A survey of named entity recognition and classification. Journal of Linguisticae Investigationes 30:1; 2007.
|
| 11: 19th May |
Question Answering |
D. Mollá and J.L. Vicedo. Draft paper "Open-domain Question Answering Technology: State of the Art and Future Trends". |
| 12: 26th May |
Semantic Web |
Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001. |