Unit Outline: COMP348
Document Processing and the Semantic Web
Semester 1, 2008
Convenor: Mark Dras
Prerequisites: 40cp and COMP249(P)
Students should read this unit outline carefully at the start of semester. It contains important information about the unit. If anything in it is unclear, please consult one of the teaching staff in the unit.
About This Unit
COMP348 explores the issues involved in building natural language processing (NLP) applications that operate on large bodies of real text such as are found on the World Wide Web (WWW).
With the Web being full of unstructured and largely text-based data, the applications needed to handle this have their own particular characteristics. In this unit we discuss some core applications for dealing with data on the Web, such as spam filtering and search engines. The unit also explores some developments of Web, such as emerging semantic web technologies which support the exchange of XML metadata on the Web, and Web 2.0 technologies (e.g. social networking, folksonomies, wikis and blogs). Application areas covered include information retrieval, web search, document summarisation, machine translation and information extraction.
The unit focuses on the concepts and techniques required to process real natural language text. Students gain practical experience in using the Python programming language to develop language processing systems.
Teaching Staff
| Role | Name | Room | Office hours | |
|---|---|---|---|---|
| Convenor, Lecturer | Mark Dras | madras AT ics.mq.edu.au | E6A380 | Thu 2-3 |
| Lecturer, Tutor | Diego Molla | diego AT ics.mq.edu.au | E6A331 | Mon 5-6, Fri 10-11 |
| Practical Demonstrator | Aung Kyaw Htet | ahtet AT ics.mq.edu.au | TBA | TBA |
All emails related to COMP348 should be sent to comp348-admin@ics.mq.edu.au and must include your full name and your student ID number.
Classes
Each week you should attend 3 hours of lectures, a one hour tutorial and a one hour practical. For details of days, times and rooms consult the timetables webpage.
Note that Practical sessions and tutorials commence in week 2 .
You should have selected a tutorial and a practical session at enrolment. You should attend the tutorial and practical session you are enrolled in. If you do not have a class, or if you wish to change one, you should see the enrolment operators in the E7B courtyard during the first two weeks of the semester. Thereafter you should go to the Student Centre.
Required and Recommended Texts
There is no set textbook for the unit; readings will be assigned throughout the semester, in conjunction with lecture notes.
Unit Web Page
The web page for this unit can be found at http://www.comp.mq.edu.au/units/comp348. Note that the majority of the unit materials are publicly available while some material requires you to log in to Blackboard CE6 to access it.
The unit will make use of discussion boards hosted within Blackboard CE6. Please post questions there, they will be monitored by the staff on the unit.
Learning Outcomes
A student completing the unit should have:
- A basic understanding of the range of applications that require intelligent text processing.
- An understanding of the advantages and disadvantages of shallow and deep techniques for the processing of written text.
- An understanding of a variety of shallow and rule-based approaches to intelligent text processing.
- Ability to use Python for intelligent text processing.
- An understanding of the main techniques involved in statistical approaches to intelligent text processing.
- An understanding of the Semantic Web and Web 2.0, their applications and their uses.
- Practical ability in implementing an intelligent text processing system.
- Practical ability in implementing a Semantic Web or Web 2.0 application.
In addition to the discipline-based learning objectives, all academic programs at Macquarie seek to develop students' generic skills in a range of areas. One of the aims of this unit is that students develop their skills in the following areas:
- Foundation skills of literacy, numeracy and information technology, in particular in the ability to quantitively evaluate applications;
- Communication skills, in particular in reporting on the development and evaluation of applications;
- Critical analysis skills;
- Self-awareness and interpersonal skills;
- Problem-solving skills, in developing algorithms for producing applications and elsewhere;
- Creative thinking skills.
Teaching and Learning Strategy
COMP348 is taught via lectures, tutorials and practical sessions in the laboratory. Lectures are used to introduce new material, give examples of the use of programing methods and techniques and put them in a wider context. While lectures are largely one to many presentations, you are encouraged to ask questions of the lecturer to clarify anything you might not be sure of. Tutorials are small group classes which give you the opportunity to interact with your peers and with a tutor who has a sound knowledge of the subject. You will be given problems to solve each week prior to the tutorial; preparing solutions is important because it will allow you to discuss the problems effectively with your tutor and maximise the feedback you get on your work. Practical classes give you an opportunity to practice your programming skills under the supervision of a practical demonstrator. Each week you will be given a number of problems to work on; it is important that you keep up with these problems as doing so will help you understand the material in the unit and prepare you for the work in assignments.
Each week you should:
- Attend lectures, take notes, ask questions.
- Attend your tutorial, seek feedback from your tutor on your work.
- Attend the practical session, do as many of the practical problems as you can and seek feedback from the practical demonstrator on your work.
- Read the assigned readings, add to your notes and prepare questions for your lecturer or tutor.
- Prepare answers to the following week's tutorial questions.
- Work on any assignments that have been released.
Lecture notes will be made available each week but these notes are intended as an outline of the lecture only and are not a substitute for your own notes or the textbook.
Topic List
|
Week |
Topic |
Reading |
|---|---|---|
|
1 |
Introduction + Text Processing with Python |
|
|
2 |
Basic Preprocessing: Tokenisation and Morphological Analysis |
|
|
3 |
Some Fundamentals of Statistics: Models and Evaluation |
|
|
4 |
Machine Learning and Text Classification |
|
|
5 |
Machine Learning and Text Classification (cont.) + |
|
|
6 |
Machine Translation |
|
|
7 |
Part-of-Speech Tagging and Parsing |
|
| RECESS | ||
|
8 |
Word Sense Disambiguation + Information Retrieval |
|
|
9 |
Web 2.0: Folksonomies, Wikipedia, and Other Things |
|
|
10 |
Summarisation + Information Extraction + Named Entity Recognition |
|
|
11 |
Question Answering |
|
|
12 |
The Semantic Web |
|
|
13 |
Revision |
|
Relationship Between Assessment and Learning Outcomes
- A basic understanding of the range of applications that require intelligent text processing: The exam will cover these concepts
- An understanding of the advantages and disadvantages of shallow and deep techniques for the processing of written text: The exam will cover these concepts
- An understanding of a variety of shallow and rule-based approaches to intelligent text processing: The exam will cover these concepts. In addition, the assignments will focus on the approaches of a specific application.
- Ability to use Python for intelligent text processing: All programming is done in Python.
- An understanding of the main techniques involved in statistical approaches to intelligent text processing: The exam will cover these concepts. In addition, the assignments will use some statistical modelling and require quantitative evaluation.
- An understanding of the Semantic Web and Web 2.0, their applications and their uses: The exam and the second assignment will cover this.
- Practical ability in implementing an intelligent text processing system: The assignments will focus on this.
- Practical ability in implementing a Semantic Web or Web 2.0 application: The second assignment will focus on this.
| Task | Planned Due Date | Total Marks |
|---|---|---|
| Assignment 1: Text Classification | weeks 4, 7, 8 | 25% |
| Assignment 2: Web 2.0 | week 12 | 15% |
| Final Examination | TBA | 60% |
Your final grade will depend on your performance in each part separately. In particular:
- You must perform satisfactorily in the examination in order to pass this unit.
- You must get at least 25% of the maximum marks of the combined assignment submissions (that is, 10% of the total unit assessment) to pass this unit.
All assignments should be submitted via the online WebCT system at https://learn.mq.edu.au/ by the time specified in the assignment description.
All work submitted should be readable and well presented.
Late work will be accepted with a penalty of 20% of the marks for the assignment per day submitted late. Hence, an assignment submitted five days late will not get any marks. If you cannot submit on time because of illness or other circumstances, please contact the lecturer before the due date.
Examinations
The university examination period in First Half year 2008 is from 11-27 June.
You are expected to present yourself for examination at the time and place designated in the University Examination Timetable. The timetable will be available in Draft form approximately eight weeks before the commencement of the examinations and in Final form approximately four weeks before the commencement of examinations.
You are advised that it is Macquarie University policy not to set early examinations for individuals or groups of students. All students are expected to ensure that they are available until the end of the teaching semester, that is the final day of the official examination period.
Special Consideration
The only exception to not sitting an examination at the designated time is because of documented illness or unavoidable disruption. In these circumstances you may wish to consider applying for Special Consideration. Information about unavoidable disruption and the special consideration process is available on the web (PDF).
If a Supplementary Examination is granted as a result of the Special Consideration process the examination will be scheduled after the conclusion of the official examination period. For details of the Special Consideration policy specific to the Department of Computing, see the Department's policy page.
To be eligible for special consideration you must show a genuine interest in the unit by participating in its activities. In particular:
- You must get at least 25% of the maximum marks of the combined assignment submissions (that is, 10% of the total unit assessment).
Plagiarism
Please refer to the Department of Computing Plagiarism Policy for the definition of plagiarism, advice on avoiding it and the penalties in place if you are found to have submitted plagiarised work.
University Policy on Grading
Academic Senate has a set of guidelines on the distribution of grades across the range from fail to high distinction. Your final result will include one of these grades plus a standardised numerical grade (SNG).
On occasion your raw mark for a unit (i.e., the total of your marks for each assessment item) may not be the same as the SNG which you receive. Under the Senate guidelines, results may be scaled to ensure that there is a degree of comparability across the university, so that units with the same past performances of their students should achieve similar results.
It is important that you realise that the policy does not require that a minimum number of students are to be failed in any unit. In fact it does something like the opposite, in requiring examiners to explain their actions if more than 20% of students fail in a unit.
Student Support Services
Macquarie University provides a range of Academic Student Support Services. Details of these services can accessed at http://www.student.mq.edu.au.
Staff-Student Liaison Committee
The Department has established a Staff-Student Liaison Committee at each level (100, 200, 300) to provide all students studying a Computing unit the opportunity to discuss related issues or problems with both students and staff.
For each meeting, an agenda is issued and minutes are taken. These are posted on the web at:
Details of the regular meeting dates will be posted on the unit home page. Anyone with an interest in Computing units may attend. This includes staff involved in the teaching and administration of the units, and all students currently taking a Computing unit at that level. There are formal Liaison Committee representatives for each unit who attend to present the views of the student body; all students are welcome and are encouraged to attend.
The meetings are usually held in the Department of Computing Meeting Room, E6A357.
To forward agenda items or get in touch with your representative, send an email to comp348liaison@ics.mq.edu.au.
If you have exhausted all other avenues, then you should consult the Director of Teaching (Dr Steve Cassidy) or the Head of Department (Assoc. Prof. Bernard Mans). You are entitled to have your concerns raised, discussed and resolved.