Please note: You are viewing the unstyled version of this web site. Either your browser does not support CSS (cascading style sheets) or it has been disabled.

Department of Computing

Computing >> CLT >> COMP348 home >> Tutorials >> Tutorial Week 7
 
 

COMP348 Document Processing and the Semantic Web

Tutorial Week 7

Evaluating Machine Translation

In lectures, I briefly talked about Bleu as a metric for evaluating translation. There are some additional complexities in calculating a Bleu score that I didn't mention, but here we're just going to look at a simple version, which we'll call Bleu-light, to consider some of its characteristics. (Note as well that the real Bleu should only be applied to documents as a whole, rather than individual sentences.)

  1. In Bleu-light, we calculate:

    • the precision of unigrams of a candidate with respect to a reference (i.e. the number of unigrams in the candidate that also occur in a reference); and

    • the precision of bigrams of a candidate with respect to a reference (i.e. the number of bigrams in the candidate that also occur in a reference).

    Calculate these for the following two candidate translations:

    Reference
    Thousands of university students and civic groups staged a rally yesterday in front of Seoul City Hall 
    to protest universities' decision to raise tuition fees.
    
    Candidate 1
    College lifestyles and thousands person civil society member backs opened a 
    registration fee increase opposition meeting yesterday from before Seoul City Hall watching.
    
    Candidate 2
    Thousands of people, including members of civic groups, college students and 
    in front of Seoul City Hall yesterday held a rally opposing tuition fee increases.
    
  2. There are a number of possible ways of combining these two scores, the unigram precision and the bigram precision, to get a single value representing the goodness of the translation. One is the arithmetic mean, here (unigram_precision + bigram_precision) / 2; another (which Bleu actually does) is the harmonic mean, here sqrt(unigram_precision * bigram_precision). What is the effect of using the harmonic mean rather than the arithmetic mean? (Consider the relative size of unigrams vs bigrams.)

  3. Consider a candidate translation as follows:

    Candidate 3
    to raise to raise to raise to raise to raise to raise to raise to raise  
    to raise to raise to raise to raise to raise to raise to raise to raise.
    

    What are the unigram and bigram precisions here? What consequence does this have for Bleu-light?

Transfer-based MT

You are given the following pairs of sentences in English and Quenya (an Elvish language from Lord of the Rings).

The book is red.		I parma carnë ná.
A book is red.			Parma carnë ná.
The monster is evil.		I ulundo úmëa ná.
A monster is evil.		Ulundo úmëa ná.
The elf eats.			I Elda máta.
The elf eats bread.		I Elda máta massa.
  1. What are the parts of speech for each of the words in the English sentences?

  2. What is the correspondence between the English and Quenya words?

  3. What rules would you infer in order to apply a transfer-based MT approach to translation between English and Quenya? Explain using two specific instances of rearrangements from the sentence pairs above.

Text Classification

In Assignment 1, your task is to classify Young (ages 13-17) from Old (33-47) blog posts. In this question, you'll be looking at a different range of ages, and a different type of writing, for classification.

The LUCY corpus contains samples of written English from a range of different age groups, with a focus on younger children:

  • "Polished" writing: 41 files, 102,000 words
    • B, informative: 34 files, 84,000 words
    • C, imaginative: 7 files, 17,000 words
  • Young Adult writing, E: 48 files, 33,000 words
  • Child writing: 150 files, 30,000 words
    • F, 12-year-olds: 37 files, 8000 words
    • H, 11-year-olds: 36 files, 7000 words
    • K, 10-year-olds: 29 files, 6000 words
    • M, 9-year-olds: 48 files, 9000 words

The writing is marked up with additional information, in particular the words' parts of speech and some syntactic structure. Here are two samples:

File E02: Young Adult writing
0000040	00010	-	YBL		.
0000050	00010	-	II	In	[O[S[P:p.
0000060	00010	-	AT	the	[Ns.
0000070	00010	-	JJ	Western	.
0000080	00010	-	NN1	world	.Ns]P:p]
0000090	00010	-	RT	today	[R:t.R:t]
0000100	00010	-	YC	+,	.
0000110	00010	-	AT1	a	[Ns:s.
0000120	00010	-	NN1	storm	.
0000130	00010	-	IO	of	[Po.
0000140	00010	-	NN1	controversy	.Po]Ns:s]
0000150	00010	-	NN2	rages	[Vz.Vz]
0000160	00010	-	II	over	[P:r.
0000170	00010	-	AT	the	[N.
0000180	00010	-	YIL		.
0000190	00010	-	NN1	+cult	.
0000200	00010	-	IO	of	[Po.
0000210	00010	-	NN1	violence	.Po]
0000220	00010	-	YIR	+	.
0000230	00010	-	CC	or	[N+.
0000240	00010	-	JJ	excessive	.
0000250	00010	-	NN1	portrayal	[NN1n&.
0000260	00010	-	CC	and	[NN1u+.
0000270	00010	-	NN1	glorification	.NN1u+]NN1n&]
0000280	00010	-	IO	of	[Po.
0000290	00010	-	NN1	violence	.Po]
0000300	00010	-	II	by	[Pb.
0000310	00010	-	NN2	movies	[NN2&.
0000320	00010	-	CC	and	[NN1n+.
0000330	00010	-	NN1	television	.NN1n+]NN2&]Pb]N+]N]P:r]S]
0000340	00010	-	YF	+.	.
File M02: 9-year-olds
0000050	00010	-	YBL		.
0000060	00010	-	RG	About	[O[S[Rx:t.
0000070	00010	-	MC	five	[Np[M.
0000080	00010	-	CC	and	[Ns+.
0000090	00010	-	AT1	a	.
0000100	00010	-	NN1	half	.Ns+]M]
0000110	00010	-	NNT2	years	.Np]
0000120	00010	-	RA	ago	.Rx:t]
0000140	00010	-	APPG	my	[Ns:s.
0000150	00010	-	NNS1	grandpa	.Ns:s]
0000160	00010	-	VVD	came	[Vd.Vd]
0000170	00010	-	RL	home	[R:q.R:q]
0000180	00010	-	II	from	[P:q.
0000190	00010	-	APPG	his	[Ns.
0000200	00010	-	NN1	holiday	.
0000210	00010	-	II	in	[P.
0000230	00010	-	NP1	Italy	[Nns.Nns]P]Ns]P:q]S]
0000240	00010	-	YF	+.	.
0000250	00010	-	PPHS1	He	[S[Nas:s.Nas:s]
0000260	00010	-	VVD	brought	[Vd.Vd]
0000270	00010	-	PPIO1	me	[Neo:i.Neo:i]
0000280	00010	-	AT1	a	[Ns:o.
0000290	00010	-	JJ	little	.
0000300	00010	-	MC	two	[Ns.
0000310	00010	-	NN1	wheeler	.Ns]
0000330	00010	-	NN1	bike	.Ns:o]S]
0000340	00010	-	YF	+.	.
0000350	00010	-	PPHS1	He	[S[Nas:s.Nas:s]
0000360	00010	-	VVD	brought	[Vd.Vd]
0000370	00010	-	PPIO1	me	[Neo:i.Neo:i]
0000380	00010	-	MC	two	[Np:o.
0000390	00010	-	JJ	little	.
0000400	00010	-	NN1	side	.
0000410	00010	-	NN2	wheels	.Np:o]
0000430	00010	-	CS~CSi	if	[Fa:c.
0000440	00010	-	PPIS1	I	[Nea:s.Nea:s]
0000450	00010	-	VM	could	[Vdce.
0000460	00010	-	XX	not	.
0000470	00010	-	VV0	ride	.Vdce]
0000480	00010	-	PPH1	it	[Ni:o.Ni:o]Fa:c]S]
0000490	00010	-	YF	+.	.

What features might you consider focussing on in text classification here?

Statistical Machine Translation

We want to translate the following sentence from English to Dutch: I am very happy

We want to solve this with statistical MT. In this question we ignore sentence aligning and language divergences so we are left with the following translation data:

English Dutch P(Dutch|English)
I ikzelf 0.2
I ik 0.6
I mij 0.2
am ben 0.8
am is 0.2
very erg 1
happy gelukkig 0.5
happy blij 0.5
Dutch English P(English|Dutch)
ikzelf I 0.8
ik I 0.5
mij I 0.1
ben am 0.6
is am 0.2
erg very 1
gelukkig happy 0.2
blij happy 0.6

  1. Which of the two translation tables is needed to build our translation model?

  2. For the language model on the target side we use a trigram language model. This means that the fluency of a Dutch sentence is computed with this formula, assuming that a dutch sentence has the words d = w1, w2, ..., wn:

    P(d) = prodi=1..n P( wi|wi-2,wi-1)

    In the following trigram model the φ is the begin of sentence marker. So we have the following data:

    (W1,W2) W3 P(W3|W1,W2)
    (φ,φ) ik 0.6
    (φ,φ) ikzelf 0.3
    (φ,φ) mij 0.1
    (φ,ik) is 0.1
    (φ,ik) ben 0.9
    (φ,ikzelf) is 0.4
    (φ,ikzelf) ben 0.6
    (φ,mij) is 0.3
    (φ,mij) ben 0.7
    (ik,is) erg 1
    (ikzelf,is) erg 1
    (mij,is) erg 1
    (ik,ben) erg 1
    (ikzelf,ben) erg 1
    (mij,ben) erg 1
    (ben,erg) blij 0.45
    (ben,erg) gelukkig 0.55
    (is,erg) blij 0.45
    (is,erg) gelukkig 0.55

    Build the most likely translation into Dutch.


Comments to: Mark Dras or Diego Molla

Computing | Division ICS | Macquarie University

Last Modified:
Copyright Macquarie University
CRICOS provider no. 00002J