| |
COMP348 Document Processing and the Semantic Web
Assignment 1, Part 3: Feedback
Students who completed this (i.e. whose systems produced a result) again did well.
Historically students have found this reasonably challenging; I
think your good performance was partly a result of having gotten to grips with the framework
in Part 2 first, and using that as a base.
Again, the marks on your hardcopies are
for each of the four subparts (quality of results, quality of code,
correctness of accuracy, quality of report), along with the total. The accuracy on the unseen
data is also written there.
My general comments:
Quality of Results
-
As noted in the specs, marks in this section depended on your system accuracy. I ran your
programs on new test data you hadn't seen before. I assigned marks here as follows:
-
the most accurate system got 4.5
-
otherwise, systems scoring > 60% got 4
-
otherwise, systems correctly categorising all data got 3.5
-
otherwise, systems correctly categorising some data got 3
-
otherwise, a partial score depending on completeness
-
The top 3 systems on the unseen test data were
(accuracy to 2 d.p.):
-
Ilun AHN 0.9375
-
Luke O'REILLY 0.825
-
James PISKORZ 0.8125
Your accuracy is written on your report.
Compared with Part 2, about half of the submissions had higher accuracies with the SVM,
and half with the rule-based approaches. When people apply this in similar real situations,
machine learning generally does better for large amounts of data, and takes less time than
constructing rules.
A reason that your SVM system might have done worse than the rule-based one is in the choice
of features. Say you chose letter triples, as in the prac tasks on classifying text as either
Dutch or English. When you train your SVM system, and check it on the training data,
it will look like it did quite well (e.g. 75% accuracy); that's because the SVM is in effect
memorising the training data. Applying it to new data, the accuracy will be little different
from 50%, because these features don't differ between Young and Old blog posts. For the
original problem of distinguishing between Dutch and English, however, it is
a good choice of features for that task: in any new test data, the distribution of letter triples
will differ (e.g. you'll get uur a lot more in any new Dutch text than in new English text).
Quality of Code
-
Again, don't cut and paste code when you should use a function. This happened quite often where
people duplicated code for young and old categories.
-
Similarly, you shouldn't cut and paste code when you should use a list. Python does this really
neatly. Here's an example. Instead of this:
if curWord == 'holla':
YoungFeatures += 1
if curWord == 'gurl':
YoungFeatures += 1
if curWord == 'beserk':
YoungFeatures += 1
...
use something like this:
for w in ['holla', 'gurl', 'beserk', ...]:
if curWord == w:
YoungFeatures += 1
-
Be aware of how to access the keys of a dictionary, instead of using multiple lists.
-
Don't hardcode constants, especially if you repeat them in multiple places. What if you want to
change your threshold?
-
If you're calling Python functions from another Python module, use import, not
os.system. os.system is only for executing code external to Python.
Calculation of Accuracy
-
In calculating z-scores, some people had a lower accuracy for the SVM than for the rule-based
system, and ended up with a negative z-score. If the absolute value is larger than 1.96,
the difference is significant. I don't think my lecture notes were sufficiently clear on that,
so I didn't take any marks off for this error.
Report
-
System Descriptions: The aim of the system description is for you to tell me things like how you chose your features,
and how you tokenised. It should be sufficiently detailed that I could (with a bit of effort,
and perhaps not exactly) reproduce your approach. It's not to describe things like "And then I
open a file using open() ...".
-
The specs asked you to describe how to run the system. In those cases where you didn't (and especially
if you didn't include model.dat as specified), it was often painful for me when I had to
try to guess how to run it on my new data.
-
MAKE SURE YOU SPELLCHECK YOUR REPORTS. It's also useful to have someone proofread what you've written.
Mark Dras or
|