Please note: You are viewing the unstyled version of this web site. Either your browser does not support CSS (cascading style sheets) or it has been disabled.

Department of Computing

Computing >> CLT >> COMP348 home >> Tutorials >> Tutorial Week 2
 
 

COMP348 Document Processing and the Semantic Web

Tutorial Week 2

Language Technology and Python

Applications that Benefit from Intelligent Text Processing

On the lectures you have seen some examples of applications that will benefit from language technology. The list is by no means exhaustive. In this first exercise you will discuss in group other applications.

  1. In groups of two, come up with as many applications as possible, regardless of whether they are currently available or not. (Think Star Trek, Stargate, Minority Report, Dr Who, Torchwood, ...)

  2. Combine your group with another group and discuss the potential complexities of these applications and, accordingly, classify them into one of these categories:

    1. Commercialised nowadays
    2. Achievable with the current technology
    3. Achievable in less than 10 years
    4. Achievable in less than 100 years
    5. Not achievable in less than 100 years
    6. Not achievable at all

Regular Expressions in Python

  1. What would the following REs match in each string:

    1. .+ in 'this is a string'
    2. \([0-9]+\)\s[0-9]+ in 'the number is (02) 9850 9581'
    3. \w+ in 'this is a string'

  2. If you wanted to match dates of the form:

    1. 20 March 2003
    2. March 20, 2003
    3. 20/3/03
    4. 20-Mar-03

    could you write a single RE? If not, how might you handle the problem? Sketch Python code that might be able to deal with all four.

More Regular Expressions in Python

Sometimes words are hyphenated across lines; consider the following text:

At the period when these events took place, I had just returned
from a scientific research in the disagreeable territory
of Nebraska, in the United States.  In virtue of my office
as Assistant Professor in the Museum of Natural History in Paris,
the French Government had attached me to that expedition.
After six months in Nebraska, I arrived in New York to-
wards the end of March, laden with a precious collection.
My departure for France was fixed for the first days in May.
Meanwhile I was occupying myself in classifying my minera-
logical, botanical, and zoological riches, when the accident 
happened to the Scotia.

Write a Python function that will count how many times a word appears in the text, `repairing' the hyphen breaks. Take care of matching full words. For example, suppose that the above text is stored in the variable text. Then:

>>> myCount(Nebraska,text)
2
>>> myCount(towards,text)
1
>>> myCount(erio,text)
0
>>>

Comments to: Mark Dras or Diego Molla

Computing | Division ICS | Macquarie University

Last Modified:
Copyright Macquarie University
CRICOS provider no. 00002J