Department of Computing

Local Navigation

Unit Outline: COMP348

Document Processing and the Semantic Web

Semester 1, 2008

Convenor: Mark Dras

Prerequisites: 40cp and COMP249(P)

Students should read this unit outline carefully at the start of semester. It contains important information about the unit. If anything in it is unclear, please consult one of the teaching staff in the unit.

About This Unit

COMP348 explores the issues involved in building natural language processing (NLP) applications that operate on large bodies of real text such as are found on the World Wide Web (WWW).

With the Web being full of unstructured and largely text-based data, the applications needed to handle this have their own particular characteristics. In this unit we discuss some core applications for dealing with data on the Web, such as spam filtering and search engines. The unit also explores some developments of Web, such as emerging semantic web technologies which support the exchange of XML metadata on the Web, and Web 2.0 technologies (e.g. social networking, folksonomies, wikis and blogs). Application areas covered include information retrieval, web search, document summarisation, machine translation and information extraction.

The unit focuses on the concepts and techniques required to process real natural language text. Students gain practical experience in using the Python programming language to develop language processing systems.

Teaching Staff

Role Name Email Room Office hours
Convenor, Lecturer Mark Dras madras AT ics.mq.edu.au E6A380 Thu 2-3
Lecturer, Tutor Diego Molla diego AT ics.mq.edu.au E6A331 Mon 5-6, Fri 10-11
Practical Demonstrator Aung Kyaw Htet ahtet AT ics.mq.edu.au TBA TBA

All emails related to COMP348 should be sent to comp348-admin@ics.mq.edu.au and must include your full name and your student ID number.

Classes

Each week you should attend 3 hours of lectures, a one hour tutorial and a one hour practical. For details of days, times and rooms consult the timetables webpage.

Note that Practical sessions and tutorials commence in week 2 .

You should have selected a tutorial and a practical session at enrolment. You should attend the tutorial and practical session you are enrolled in. If you do not have a class, or if you wish to change one, you should see the enrolment operators in the E7B courtyard during the first two weeks of the semester. Thereafter you should go to the Student Centre.

Required and Recommended Texts

There is no set textbook for the unit; readings will be assigned throughout the semester, in conjunction with lecture notes.

Unit Web Page

The web page for this unit can be found at http://www.comp.mq.edu.au/units/comp348. Note that the majority of the unit materials are publicly available while some material requires you to log in to Blackboard CE6 to access it.

The unit will make use of discussion boards hosted within Blackboard CE6. Please post questions there, they will be monitored by the staff on the unit.

Learning Outcomes

A student completing the unit should have:

  1. A basic understanding of the range of applications that require intelligent text processing.
  2. An understanding of the advantages and disadvantages of shallow and deep techniques for the processing of written text.
  3. An understanding of a variety of shallow and rule-based approaches to intelligent text processing.
  4. Ability to use Python for intelligent text processing.
  5. An understanding of the main techniques involved in statistical approaches to intelligent text processing.
  6. An understanding of the Semantic Web and Web 2.0, their applications and their uses.
  7. Practical ability in implementing an intelligent text processing system.
  8. Practical ability in implementing a Semantic Web or Web 2.0 application.

In addition to the discipline-based learning objectives, all academic programs at Macquarie seek to develop students' generic skills in a range of areas. One of the aims of this unit is that students develop their skills in the following areas:

Teaching and Learning Strategy

COMP348 is taught via lectures, tutorials and practical sessions in the laboratory. Lectures are used to introduce new material, give examples of the use of programing methods and techniques and put them in a wider context. While lectures are largely one to many presentations, you are encouraged to ask questions of the lecturer to clarify anything you might not be sure of. Tutorials are small group classes which give you the opportunity to interact with your peers and with a tutor who has a sound knowledge of the subject. You will be given problems to solve each week prior to the tutorial; preparing solutions is important because it will allow you to discuss the problems effectively with your tutor and maximise the feedback you get on your work. Practical classes give you an opportunity to practice your programming skills under the supervision of a practical demonstrator. Each week you will be given a number of problems to work on; it is important that you keep up with these problems as doing so will help you understand the material in the unit and prepare you for the work in assignments.

Each week you should:

Lecture notes will be made available each week but these notes are intended as an outline of the lecture only and are not a substitute for your own notes or the textbook.

Topic List

Week

Topic

Reading

1

Introduction + Text Processing with Python

 

2

Basic Preprocessing: Tokenisation and Morphological Analysis

 

3

Some Fundamentals of Statistics: Models and Evaluation

 

4

Machine Learning and Text Classification

 

5

Machine Learning and Text Classification (cont.) +
Grammars

 

6

Machine Translation

 

7

Part-of-Speech Tagging and Parsing

 

RECESS

8

Word Sense Disambiguation + Information Retrieval

 

9

Web 2.0: Folksonomies, Wikipedia, and Other Things

 

10

Summarisation + Information Extraction + Named Entity Recognition

 

11

Question Answering

 

12

The Semantic Web

 

13

Revision

 

Relationship Between Assessment and Learning Outcomes

  1. A basic understanding of the range of applications that require intelligent text processing: The exam will cover these concepts
  2. An understanding of the advantages and disadvantages of shallow and deep techniques for the processing of written text: The exam will cover these concepts
  3. An understanding of a variety of shallow and rule-based approaches to intelligent text processing: The exam will cover these concepts. In addition, the assignments will focus on the approaches of a specific application.
  4. Ability to use Python for intelligent text processing: All programming is done in Python.
  5. An understanding of the main techniques involved in statistical approaches to intelligent text processing: The exam will cover these concepts. In addition, the assignments will use some statistical modelling and require quantitative evaluation.
  6. An understanding of the Semantic Web and Web 2.0, their applications and their uses: The exam and the second assignment will cover this.
  7. Practical ability in implementing an intelligent text processing system: The assignments will focus on this.
  8. Practical ability in implementing a Semantic Web or Web 2.0 application: The second assignment will focus on this.

Task Planned Due Date Total Marks
Assignment 1: Text Classification weeks 4, 7, 8 25%
Assignment 2: Web 2.0 week 12 15%
Final Examination TBA 60%

Your final grade will depend on your performance in each part separately. In particular:

All assignments should be submitted via the online WebCT system at https://learn.mq.edu.au/ by the time specified in the assignment description.

All work submitted should be readable and well presented.

Late work will be accepted with a penalty of 20% of the marks for the assignment per day submitted late. Hence, an assignment submitted five days late will not get any marks. If you cannot submit on time because of illness or other circumstances, please contact the lecturer before the due date.

Examinations

The university examination period in First Half year 2008 is from 11-27 June.

You are expected to present yourself for examination at the time and place designated in the University Examination Timetable. The timetable will be available in Draft form approximately eight weeks before the commencement of the examinations and in Final form approximately four weeks before the commencement of examinations.

You are advised that it is Macquarie University policy not to set early examinations for individuals or groups of students. All students are expected to ensure that they are available until the end of the teaching semester, that is the final day of the official examination period.

Special Consideration

The only exception to not sitting an examination at the designated time is because of documented illness or unavoidable disruption. In these circumstances you may wish to consider applying for Special Consideration. Information about unavoidable disruption and the special consideration process is available on the web (PDF).

If a Supplementary Examination is granted as a result of the Special Consideration process the examination will be scheduled after the conclusion of the official examination period. For details of the Special Consideration policy specific to the Department of Computing, see the Department's policy page.

To be eligible for special consideration you must show a genuine interest in the unit by participating in its activities. In particular:

Plagiarism

Please refer to the Department of Computing Plagiarism Policy for the definition of plagiarism, advice on avoiding it and the penalties in place if you are found to have submitted plagiarised work.

University Policy on Grading

Academic Senate has a set of guidelines on the distribution of grades across the range from fail to high distinction. Your final result will include one of these grades plus a standardised numerical grade (SNG).

On occasion your raw mark for a unit (i.e., the total of your marks for each assessment item) may not be the same as the SNG which you receive. Under the Senate guidelines, results may be scaled to ensure that there is a degree of comparability across the university, so that units with the same past performances of their students should achieve similar results.

It is important that you realise that the policy does not require that a minimum number of students are to be failed in any unit. In fact it does something like the opposite, in requiring examiners to explain their actions if more than 20% of students fail in a unit.

Student Support Services

Macquarie University provides a range of Academic Student Support Services. Details of these services can accessed at http://www.student.mq.edu.au.

Staff-Student Liaison Committee

The Department has established a Staff-Student Liaison Committee at each level (100, 200, 300) to provide all students studying a Computing unit the opportunity to discuss related issues or problems with both students and staff.

For each meeting, an agenda is issued and minutes are taken. These are posted on the web at:

Details of the regular meeting dates will be posted on the unit home page. Anyone with an interest in Computing units may attend. This includes staff involved in the teaching and administration of the units, and all students currently taking a Computing unit at that level. There are formal Liaison Committee representatives for each unit who attend to present the views of the student body; all students are welcome and are encouraged to attend.

The meetings are usually held in the Department of Computing Meeting Room, E6A357.

To forward agenda items or get in touch with your representative, send an email to comp348liaison@ics.mq.edu.au.

If you have exhausted all other avenues, then you should consult the Director of Teaching (Dr Steve Cassidy) or the Head of Department (Assoc. Prof. Bernard Mans). You are entitled to have your concerns raised, discussed and resolved.

Copyright & Site information

  • CRICOS Provider No 00002J, ABN 90 952 801 237
  • Authorised by: HOD