Assignment-03 IBM1 model for Machine Translation
I)
Develop an IBM1 Model for Source (S1) and Target (Ti) language. (around 20/50/100 sentences)
Get the translation from translation tools T1-T5 Translation.
For each translation develop IBM1 model and compare your results
Analyze each translation with respect to problems like picking
the wrong word sense of a polysemous word, or word ordering problems,
or other problems. Document your observation explicitly for each sentence.
Generate a table containing the word translation probabilities that were learned
List the reference sentence and the translated sentence using IBM1 model.
Plot graphs for your results
Compare alignments with mixed case versus lowercase.
Generate an alignment in the opposite direction
Calculate p(e|f) for the first three sentences.
Use the following word alignment tool
http://www.cis.uni-muenchen.de/~fraser/nepal/align_browser_and_german_short.zip
Customize the tool for your sent of translations
II)
Download the data from the joshua indian corpora page (google: joshua indian corpora), ideally you
should only use a small number of short sentences at first, so that things run quickly.
Take a set of sentences which have prior correct reference translation. Take 5 sentences for which you
get bad output from T1-T5 translators. Translate them again, preferably from a different IP
address. Compare the outcome
Take the first 5 sentences of the *training* data for the language
you are working with from the Indian Parallel Corpora. Try these
sentences in T1-T5 translators (in the Hindi/Tamil/Urdu to English
direction). Compare the outputs with the
English parallel sentences Document your observations about different English translations. Compare all the
translators. Analyze the issues caused by phrase-based SMT.
IBM Model Assignment