With the application of deep learning in speech and natural language processing, the accuracy of speech recognition and machine translation are becoming better. We provide a dataset for Machine translation. We provide over 10 million parallel English-Chinese dataset. The data consists of conversational English extracted from English learning websites and movie subtitles, and all data have been checked by human annotators. All the parallel data is checked by human so that it is guaranteed in terms of data size, domain relevance and quality.
Training Set: 10,000,000 sentences
Validation(Simultaneous Interpretation) Set: 934 sentences
Validation(Machine Translation) Set: 8000 sentences
An English-Chinese sentence pair includes an English sentence and a Chinese sentence, where the Chinese sentence is translated by human annotators from the English sentence. The dataset contains 2 files. The Chinese file contains Chinese sentences and English file contains corresponding English sentences, and sentences have a cross-file one-on-one matching relationship.
With fruit growing all year round, this is indeed a paradise for birds.
I dropped Henry at your office an hour ago.
Father and son, two bricklayers, are sitting in a cafe arguing about a car.
A small, disciplined militia can not only hold out against a larger force but drive it back.
I start to sweat when I worry about people noticing my sweat.
I fly to Florida a couple of times a year to visit the folks.
I just ended a five-month relationship an hour ago.
For any copyright related inquiries, please contact email@example.com.