Assignment-03 IBM1 model for Machine Translation




I) Develop an IBM1 Model for Source (S1) and Target (Ti) language. (around 20/50/100 sentences)
Get the translation from translation tools T1-T5 Translation. For each translation develop IBM1 model and compare your results Analyze each translation with respect to problems like picking the wrong word sense of a polysemous word, or word ordering problems, or other problems. Document your observation explicitly for each sentence.
Generate a table containing the word translation probabilities that were learned List the reference sentence and the translated sentence using IBM1 model. Plot graphs for your results Compare alignments with mixed case versus lowercase. Generate an alignment in the opposite direction Calculate p(e|f) for the first three sentences.
Use the following word alignment tool http://www.cis.uni-muenchen.de/~fraser/nepal/align_browser_and_german_short.zip Customize the tool for your sent of translations

II) Download the data from the joshua indian corpora page (google: joshua indian corpora), ideally you should only use a small number of short sentences at first, so that things run quickly. Take a set of sentences which have prior correct reference translation. Take 5 sentences for which you get bad output from T1-T5 translators. Translate them again, preferably from a different IP address. Compare the outcome
Take the first 5 sentences of the *training* data for the language you are working with from the Indian Parallel Corpora. Try these sentences in T1-T5 translators (in the Hindi/Tamil/Urdu to English direction). Compare the outputs with the English parallel sentences Document your observations about different English translations. Compare all the translators. Analyze the issues caused by phrase-based SMT.

References
Python notebook implementation for IBM Model 1
GIZA++ implementation

IBM Model Assignment