| |
COMP348 Document Processing and the Semantic Web
Tutorial Week 7
Evaluating Machine Translation
In lectures, I briefly talked about Bleu as a metric for evaluating translation. There are
some additional complexities in calculating a Bleu score that I didn't mention, but here we're
just going to look at a simple version, which we'll call Bleu-light, to consider some of
its characteristics. (Note as well that the real Bleu
should only be applied to documents as a whole, rather than individual sentences.)
-
In Bleu-light, we calculate:
-
the precision of unigrams of a candidate with respect to a reference (i.e. the number of unigrams
in the candidate that also occur in a reference); and
-
the precision of bigrams of a candidate with respect to a reference (i.e. the number of bigrams
in the candidate that also occur in a reference).
Calculate these for the following two candidate translations:
Reference
Thousands of university students and civic groups staged a rally yesterday in front of Seoul City Hall
to protest universities' decision to raise tuition fees.
Candidate 1
College lifestyles and thousands person civil society member backs opened a
registration fee increase opposition meeting yesterday from before Seoul City Hall watching.
Candidate 2
Thousands of people, including members of civic groups, college students and
in front of Seoul City Hall yesterday held a rally opposing tuition fee increases.
-
There are a number of possible ways of combining these two scores, the unigram precision and the
bigram precision, to get a single value representing the goodness of the translation.
One is the arithmetic mean, here (unigram_precision + bigram_precision) / 2; another
(which Bleu actually does) is the harmonic mean, here sqrt(unigram_precision * bigram_precision).
What is the effect of using the harmonic mean rather than the arithmetic mean?
(Consider the relative size of unigrams vs bigrams.)
-
Consider a candidate translation as follows:
Candidate 3
to raise to raise to raise to raise to raise to raise to raise to raise
to raise to raise to raise to raise to raise to raise to raise to raise.
What are the unigram and bigram precisions here? What consequence does this have for
Bleu-light?
Transfer-based MT
You are given the following pairs of sentences in English and Quenya
(an Elvish language from Lord of the Rings).
The book is red. I parma carnë ná.
A book is red. Parma carnë ná.
The monster is evil. I ulundo úmëa ná.
A monster is evil. Ulundo úmëa ná.
The elf eats. I Elda máta.
The elf eats bread. I Elda máta massa.
-
What are the parts of speech for each of the words in the English
sentences?
-
What is the correspondence between the English and Quenya words?
-
What rules would you infer in order to apply a transfer-based MT approach to translation
between English and Quenya? Explain using two specific
instances of rearrangements from the sentence pairs above.
Text Classification
In Assignment 1, your task is to classify Young (ages 13-17) from Old (33-47) blog posts.
In this question, you'll be looking at a different range of ages, and a different
type of writing, for classification.
The LUCY corpus contains samples
of written English from a range of different age groups, with a focus on younger children:
-
"Polished" writing: 41 files, 102,000 words
-
B, informative: 34 files, 84,000 words
-
C, imaginative: 7 files, 17,000 words
-
Young Adult writing, E: 48 files, 33,000 words
-
Child writing: 150 files, 30,000 words
-
F, 12-year-olds: 37 files, 8000 words
-
H, 11-year-olds: 36 files, 7000 words
-
K, 10-year-olds: 29 files, 6000 words
-
M, 9-year-olds: 48 files, 9000 words
The writing is marked up with additional information, in particular the words' parts of speech
and some syntactic structure.
Here are two samples:
File E02: Young Adult writing
0000040 00010 - YBL .
0000050 00010 - II In [O[S[P:p.
0000060 00010 - AT the [Ns.
0000070 00010 - JJ Western .
0000080 00010 - NN1 world .Ns]P:p]
0000090 00010 - RT today [R:t.R:t]
0000100 00010 - YC +, .
0000110 00010 - AT1 a [Ns:s.
0000120 00010 - NN1 storm .
0000130 00010 - IO of [Po.
0000140 00010 - NN1 controversy .Po]Ns:s]
0000150 00010 - NN2 rages [Vz.Vz]
0000160 00010 - II over [P:r.
0000170 00010 - AT the [N.
0000180 00010 - YIL .
0000190 00010 - NN1 +cult .
0000200 00010 - IO of [Po.
0000210 00010 - NN1 violence .Po]
0000220 00010 - YIR + .
0000230 00010 - CC or [N+.
0000240 00010 - JJ excessive .
0000250 00010 - NN1 portrayal [NN1n&.
0000260 00010 - CC and [NN1u+.
0000270 00010 - NN1 glorification .NN1u+]NN1n&]
0000280 00010 - IO of [Po.
0000290 00010 - NN1 violence .Po]
0000300 00010 - II by [Pb.
0000310 00010 - NN2 movies [NN2&.
0000320 00010 - CC and [NN1n+.
0000330 00010 - NN1 television .NN1n+]NN2&]Pb]N+]N]P:r]S]
0000340 00010 - YF +. .
File M02: 9-year-olds
0000050 00010 - YBL .
0000060 00010 - RG About [O[S[Rx:t.
0000070 00010 - MC five [Np[M.
0000080 00010 - CC and [Ns+.
0000090 00010 - AT1 a .
0000100 00010 - NN1 half .Ns+]M]
0000110 00010 - NNT2 years .Np]
0000120 00010 - RA ago .Rx:t]
0000140 00010 - APPG my [Ns:s.
0000150 00010 - NNS1 grandpa .Ns:s]
0000160 00010 - VVD came [Vd.Vd]
0000170 00010 - RL home [R:q.R:q]
0000180 00010 - II from [P:q.
0000190 00010 - APPG his [Ns.
0000200 00010 - NN1 holiday .
0000210 00010 - II in [P.
0000230 00010 - NP1 Italy [Nns.Nns]P]Ns]P:q]S]
0000240 00010 - YF +. .
0000250 00010 - PPHS1 He [S[Nas:s.Nas:s]
0000260 00010 - VVD brought [Vd.Vd]
0000270 00010 - PPIO1 me [Neo:i.Neo:i]
0000280 00010 - AT1 a [Ns:o.
0000290 00010 - JJ little .
0000300 00010 - MC two [Ns.
0000310 00010 - NN1 wheeler .Ns]
0000330 00010 - NN1 bike .Ns:o]S]
0000340 00010 - YF +. .
0000350 00010 - PPHS1 He [S[Nas:s.Nas:s]
0000360 00010 - VVD brought [Vd.Vd]
0000370 00010 - PPIO1 me [Neo:i.Neo:i]
0000380 00010 - MC two [Np:o.
0000390 00010 - JJ little .
0000400 00010 - NN1 side .
0000410 00010 - NN2 wheels .Np:o]
0000430 00010 - CS~CSi if [Fa:c.
0000440 00010 - PPIS1 I [Nea:s.Nea:s]
0000450 00010 - VM could [Vdce.
0000460 00010 - XX not .
0000470 00010 - VV0 ride .Vdce]
0000480 00010 - PPH1 it [Ni:o.Ni:o]Fa:c]S]
0000490 00010 - YF +. .
What features might you consider focussing on in text classification here?
Statistical Machine Translation
We want to translate the following sentence from English to Dutch: I am very happy
We want to solve this with statistical MT. In this question we ignore sentence aligning and language divergences so we are left with the following translation data:
| English |
Dutch |
P(Dutch|English) |
| I |
ikzelf |
0.2 |
| I |
ik |
0.6 |
| I |
mij |
0.2 |
| am |
ben |
0.8 |
| am |
is |
0.2 |
| very |
erg |
1 |
| happy |
gelukkig |
0.5 |
| happy |
blij |
0.5 |
|
| Dutch |
English |
P(English|Dutch) |
| ikzelf |
I |
0.8 |
| ik |
I |
0.5 |
| mij |
I |
0.1 |
| ben |
am |
0.6 |
| is |
am |
0.2 |
| erg |
very |
1 |
| gelukkig |
happy |
0.2 |
| blij |
happy |
0.6 |
|
-
Which of the two translation tables is needed to build our translation model?
-
For the language model on the target side we use a trigram language model. This means that the fluency of a Dutch sentence is computed with this formula, assuming that a dutch sentence has the words d = w1, w2, ..., wn:
P(d) = prodi=1..n P( wi|wi-2,wi-1)
In the following trigram model
the φ is the begin of sentence marker. So we have the following data:
| (W1,W2) |
W3 |
P(W3|W1,W2) |
| (φ,φ) |
ik |
0.6 |
| (φ,φ) |
ikzelf |
0.3 |
| (φ,φ) |
mij |
0.1 |
| (φ,ik) |
is |
0.1 |
| (φ,ik) |
ben |
0.9 |
| (φ,ikzelf) |
is |
0.4 |
| (φ,ikzelf) |
ben |
0.6 |
| (φ,mij) |
is |
0.3 |
| (φ,mij) |
ben |
0.7 |
| (ik,is) |
erg |
1 |
| (ikzelf,is) |
erg |
1 |
| (mij,is) |
erg |
1 |
| (ik,ben) |
erg |
1 |
| (ikzelf,ben) |
erg |
1 |
| (mij,ben) |
erg |
1 |
| (ben,erg) |
blij |
0.45 |
| (ben,erg) |
gelukkig |
0.55 |
| (is,erg) |
blij |
0.45 |
| (is,erg) |
gelukkig |
0.55 |
Build the most likely translation into Dutch.
Mark Dras or
|