Assignment-04 Extraction of Syntactic Translation Models from Parallel Data using Syntax from Source and Target Languages
Problem Statement:
Extraction of Syntactic Translation Models for English -Indic language pair
Demonstrate your framework for different categories of sentences. Take at least 100 sentences for each category. Make sure you collect sentences of different lengths ( less than 10 words; 11-20 words; 20-40 words; 40+ words)
Type Example
Declarative Sentences ( The Sun rises in the East )
Interrogative Sentences (What are the major differences between metals and non-metals?)
Exclamatory Sentences (That was a great match!)
Imperative Sentences (Please pick up the notes when you come.)
Compound Sentences (Rahul did not complete his homework, so the teacher punished him.)
Complex Sentences (The children were asked to go home because it was too late.)
Long-distance dependencies (The man next to the large oak tree near the grocery store on the corner is tall.
The men next to the large oak tree near the grocery store on the corner are tall.
The bird next to the large oak tree near the grocery store on the corner flies rapidly.
The man next to the large oak tree near the grocery store on the corner talks rapidly.)
News Headlines ( Incessant heavy rainfall causes waterlogging in parts of Delhi)
Active sentences
She bought a new car. Passive voice: A new car was bought by her.
Passive sentences:
Reversible ( The baby was kissed on the head by the lady)
Nonreversible ( The milk was spilled by the boy)
Ambiguous (The fish was eaten)
OR Sentences (Sometimes he would take care of the whole flock while the shepherd was resting eating his dinner.)
( The goat that the pig had bumped near the bush was smiling)
SVO Structured (The boy smiling is consoling the little girl crying)
idioms (the spirit is willing but the flesh is weak) (सोने पर सुहागा)
You may collect sentences from : https://sentence.yourdictionary.com/
Learn a Language With Sentence Mining
In addition, include Some ill formed translations and try to explain the reasons.
(The train that the knife had helped under the square was cold)
Explain your methodology
Examine your approach based on any or combination of the following approaches:
Hierarchical Phrase-based Translation
Inversion transduction grammar (ITG)
Bracketing Transduction Grammar (BTG)
Panini grammar
To be reported:
Total number of Sentences
Number of words
Number of unique words
Evaluation metrics for sentence level : BLEU, METEOR, spearman, ..
Check the type of sentence in source and translated language. Is the form same/different, one-to- many forms
Perplexity vs different dataset sizes
Perplexity vs different length of sentences
Perplexity vs type of sentences
Perplexity vs iterations
Also summarize issues and challenges faced and the ways to mitigate them.
List possible application area of your work.