Please note: You are viewing the unstyled version of this web site. Either your browser does not support CSS (cascading style sheets) or it has been disabled.

Department of Computing

Computing >> CLT >> COMP348 home >> Tutorials >> Tutorial Week 12
 
 

COMP348 Document Processing and the Semantic Web

Tutorial Week 12

Question Classification

  1. Here is a very simple question classifier in Python:
    import re
    def classify(question):
      "return the expected answer type"
      if re.match(r'^\s*Who\b'):
         return "PERSON"
      # ... add more rules here ...
    
    Extend it so that it correctly classifies the following questions. Try to be as general as possible.
    1. When was Karen Spärck Jones born?
    2. Where was Karen Spärck Jones born?
    3. At what organisation did she work during 1974?
    4. What does IDF stand for?
    5. Who was president of the Association for Computational Linguistics in 1994?
  2. Swap your classifier with that of your neighbour and see if you can come up with questions that would be wrongly classified. Extend your neighbour's classifier to handle the new questions.
  3. Come up with a likely set of features that could be used to correctly classify the above questions using a statistical classifier.

Question Answering and NLP

The following text (same text as in the tutorial of week 11) contains the answers to all of the questions of the first exercise.

Karen Spärck Jones FBA (26 August 1935 - 4 April 2007) was a British computer scientist.

Karen Spärck Jones was born in Huddersfield, Yorkshire, England. Her father was Owen Jones, a lecturer in chemistry, and her mother was Ida Spärck, a Norwegian who moved to Britain during World War II. Spärck Jones was educated at a grammar school and then Girton College, Cambridge from 1953 to 1956, reading History. Initially she became a school teacher.

She worked at Cambridge's Computer Laboratory from 1974, and retired in 2002, holding the post of Professor of Computers and Information. She continued to work in the Computer Laboratory until shortly before her death. Her main research interests, since the late 1950s, were natural language processing and information retrieval. One of her most important contributions was the concept of inverse document frequency (IDF) weighting in information retrieval, which she introduced in a 1972 paper. IDF is used in most search engines today, usually as part of the tf-idf weighting scheme.

Prof. Spärck Jones was a Fellow of the British Academy, of which she was Vice-President in 2000-02. She was also a Fellow of both the AAAI and the ECCAI and was President of the Association for Computational Linguistics in 1994. She received several awards for her research including the Gerard Salton Award (1988), the ASIS&T Award of Merit (2002), the ACL Lifetime Achievement Award (2004), the BCS Lovelace Medal (2007) and the ACM-AAAI Allen Newell Award (2007).

She was married to fellow Cambridge computer scientist Roger Needham until his death in 2003. She died at Willingham in Cambridgeshire.

  1. For each of the questions of the first exercise, discuss what techniques would be best to find the answer.
  2. Here is a list of NLP tasks. For each of them, explain how it could be used to improve the accuracy of a question answering system.
    1. Information Retrieval
    2. Tokenisation
    3. Parsing
    4. Named Entity Recognition
  3. Here is a list of tools and resources. For each of them, explain how it could be used to improve the accuracy of a question answering system.
    1. Gazetteers (lists of names)
    2. Ontologies, thesauri
    3. Wikipedia
    4. A Web search engine
    5. A logical reasoner

Comments to: Mark Dras or Diego Molla

Computing | Division ICS | Macquarie University

Last Modified:
Copyright Macquarie University
CRICOS provider no. 00002J