Assignment-01

Translation score based on Adequacy and Fluency

Google Classroom Translate a text from technical document using following translators: (around 2million characters)
T1: https://aws.amazon.com/translate/
T2: https://www.systran.net/en/translate/
T3: https://www.deepl.com/translator
T4: Google translator
T5: Microsoft Bing translator

Adequacy: Source text adherence is judged to the source text norms and meaning, in terms of how well the target text represents the informational content of the source text.

Fluency: The degree of adherence to the target text and target language norms, referring, for example, grammatical correctness and clarity. When judging fluency, the source text is not relevant.
Compare the quality of translation for each sentence translation based on a) Adequacy : All Meaning 5 ; Most Meaning 4 ; Much Meaning 3; Little Meaning 2; None 1. b) Fluency : Flawless Language 5; Good Language 4; Non-native Language 3; Disfluent Language 2; Incomprehensible 1. You may consider around 200 sentences for Human evaluation. Select sentences having varied length.

Analyze your results and give your comments To be submitted: Cover-page Source text in language S1; Target texts translated by T1-T5 tools Evaluation Scores for each translator with different parameter settings Identify the issues related to language divergence aspects, out of vocabulary words, length and construct of sentences, reordering, alignments,... sentence length: 1-10 words; 11-20 words; 20-40 words; 40+ words list all issues that you may have come across while performing this task. Include all references in your submissions For submission use latex files (you may use overleaf)

Assignment-03

IBM1 model for Machine Translation

Solution Download

Develop an IBM1 Model for Source (S1) and Target (Ti) language. (around 20/50/100 sentences)

1. Get the translation from translation tools T1-T5 Translation. For each translation develop IBM1 model and compare your results Analyze each translation with respect to problems like picking the wrong word sense of a polysemous word, or word ordering problems, or other problems. Document your observation explicitly for each sentence.

Generate a table containing the word translation probabilities that were learned List the reference sentence and the translated sentence using IBM1 model. Plot graphs for your results Compare alignments with mixed case versus lowercase. Generate an alignment in the opposite direction

Calculate p(e|f) for the first three sentences.

2. Download the data from the joshua indian corpora page (google: joshua indian corpora), ideally you should only use a small number of short sentences at first, so that things run quickly.
Take a set of sentences which have prior correct reference translation. Take 5 sentences for which you get bad output from T1-T5 translators. Translate them again, preferably from a different IP
address. Compare the outcome

Take the first 5 sentences of the *training* data for the language
you are working with from the Indian Parallel Corpora. Try these
sentences in T1-T5 translators (in the Hindi/Tamil/Urdu to English
direction). Compare the outputs with the
English parallel sentences Document your observations about different English translations. Compare all the translators. Analyze the issues caused by phrase-based SMT.

Assignment-04

Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages

Solution Download

Extraction of Syntactic Translation Models for English -Indic language pair Demonstrate your framework for different categories of sentences. Take at least 100 sentences for each category. Make sure you collect sentences of different lengths ( less than 10 words; 11-20 words; 20-40 words; 40+ words)

Moses for Machine Translation

Using Moses have to translate and check the evaluation scores of BLEU,METEOR,TER,WER

Solution Download

Collect a parallel corpus of English - Hindi , or other near similar Indian languages.Determined by the quality sentence pairs using statistical machine translation (Moses, phrase-based)
Use BLEU score, METEOR, TER and WER measures to access the quality of MT

Assignment-01

Assignment-03

Assignment-04

Moses for Machine Translation

Assignment-05