Releases: gorkemozkaya/nmt-en-tr
Pretrained en->tr and tr->en NMT models, trained with TF2
Pretrained en->tr and tr->en NMT models, trained with TF2
The models are re-trained using Tensorflow 2 with a larger training corpus compared to the prior release. Please this notebook for loading these checkpoints.
blended_dataset
Releasing a blended dataset created using a combination of four different parallel corpora. The preparation code is available here. The most dominant dataset is the Open Subtitles en/tr corpora, which is downsampled to 10% of its original size. The other datasets are used at 100%.
bianet_and_ted_corpora
Adding two additional corpora that are used in the latest version of the neural machine translation model.
pretrained_models
This consists of the pre-trained model weights for English->Turkish and Turkish->English translation models.
Documentation for loading these models will soon follow.
Raw data
The raw data used for training these NMT models. These data are downloaded from http://opus.nlpl.eu and included here for model reproducibility.