Please note: You are viewing the unstyled version of this web site. Either your browser does not support CSS (cascading style sheets) or it has been disabled.

Department of Computing

Computing >> CLT >> COMP348 home >> Lecture Notes
 
 

COMP348 Document Processing and the Semantic Web

Lecture Notes

Lecture slides will be released as soon as they're available each week. The links in the table below will give you a PDF version of the slides.

Week Lec 1 Lec 2 Lec 3 Assignments
1: 25th February Unit Introduction Text Processing in Python
2: 3rd March Tokenisation and Sentence Segmentation Morphological Analysis A1 Part 1 available
3: 10th March Statistical Methods Statistical Methods and Evaluation
4: 17th March Text Classification and Machine Learning A1 Part 1 due (Sun 23 Mar, 11.59pm);
A1 Part 2 available
5: 24th March Text Classification Features Text Classification Examples
6: 31st March Grammar Machine Translation Statistical Machine Translation A1 Part 3 available
7: 7th April Part of Speech Tagging Parsing A1 Part 2 due (Sun 13 Apr, 11.59pm)
RECESS  
8: 28th April Word Sense Disambiguation Information Retrieval A1 Part 3 due (Sun 4 May, 11.59pm)
9: 5th May Graphs for Language Technology A2 available (Draft)
10: 12th May Document Summarisation Information Extraction Named Entity Recognition  
11: 19th May Question Answering  
12: 26th May The Semantic Web A2 due (Sun 1 June, 11:59pm)
13: 5th June Review/Exam Questions Review/Exam Questions  

Key Mark Dras Diego Mollá

 

 

Reading

Here are some supplementary readings to accompany various weeks.

Week Topic Article
2: 3rd March NLTK on tokenisation http://nltk.org/doc/en/words.html
  Porter Stemmer description http://www.tartarus.org/~martin/PorterStemmer/def.txt
  Porter Stemmer (Python) http://www.tartarus.org/~martin/PorterStemmer/python.txt
3: 10th March Statistics B. Krenn & C. Samuelsson (1997) The Linguist�s Guide to Statistics: Don�t Panic, Chapter 3 [local copy]
  Statistics HyperStat Online Statistics Textbook
(note that tests of proportions uses a slightly different formula from lecture notes)
4: 17th March Text Classification Methods Yiming Yang, Xin Liu (1999). A Re-Examination of Text Categorization Methods. [local copy]
  CMU Classifcation Data Collection http://www-2.cs.cmu.edu/~wcohen/
  WebKB Data http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
5: 24th March Feature Selection in Classification Yang and Pedersen (1997).A Comparative Study on Feature Selection in Text Categorization [local copy]
  Feature Engineering in Text Classification Scott & Matwin (1999). Feature Engineering in Text Classification [local copy]
  Bayesian Spam Filtering M. Sahami, S. Dumais, D. Heckerman, E. Horvitz (1998). A Bayesian approach to filtering junk e-mail
  Paul Graham's "A Plan for Spam" http://www.paulgraham.com/spam.html
  Paul Graham's "Better Bayesian Filtering" http://www.paulgraham.com/better.html
  Sentiment Classification (Turney) Turney, P.D. (2002), Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews
  Sentiment Classification (Pang and Lee) Bo Pang and Lillian Lee (2004), A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts
  Blog Profiling J. Schler, M. Koppel, S. Argamon and J. Pennebaker (2006). Effects of Age and Gender on Blogging
  Email Profiling Dominique Estival, Tanja Gaustad, Ben Hutchinson, Son Bao Pham and Will Radford (2007). Author Profiling for English Emails.
  SVM Light http://svmlight.joachims.org/
7: 7th April Part of Speech Tagging E. Brill (1992) A Simple Rule-Based Part of Speech Tagger. In Proceedings of the Third Conference on Applied Natural Language Processing, Trento, Italy.
  Part of Speech Tagging NLTK chapter 4 "Categorizing and Tagging Words" [HTML] [PDF]
  Grammars and Parsing NLTK chapter 8 "Context Free Grammars and Parsing" [HTML] [PDF]
  Chart Parsing NLTK chapter 9 "Chart Parsing and Probabilistic Parsing" [HTML] [PDF]
8: 28th April Word Sense Disambiguation Ide & Veronis (1998) Word Sense Disambiguation: The State of the Art
9: 5th May PageRank What is PageRank?
  Gaph Theory for NLP Mihalcea & Radev (2006) Graph-based Algorithms for Information Retrieval and Natural Language Processing
10: 12th May Document Summarisation E. H. Hovy, Automated Text Summarization, In: R. Mitkov (ed), The Oxford Handbook of Computational Linguistics, pp. 583-598, 2003. See Blackboard's private resources.
  Information Extraction D. Appelt and D. Israel. Introduction to Information Extraction Technology (1999)
  Information Extraction Hobbs et al. FASTUS (1997)
  Information Extraction R. Grisham, Information Extraction, In: R. Mitkov (ed), The Oxford Handbook of Computational Linguistics, pp. 545-559, 2003. In the Library.
  Named Entity Recognition David Nadeau, Satoshi Sekine. A survey of named entity recognition and classification. Journal of Linguisticae Investigationes 30:1; 2007.
11: 19th May Question Answering D. Mollá and J.L. Vicedo. Draft paper "Open-domain Question Answering Technology: State of the Art and Future Trends".
12: 26th May Semantic Web Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001.

Comments to: Mark Dras or Diego Molla