Skip to content

Releases: gorkemozkaya/nmt-en-tr

Pretrained en->tr and tr->en NMT models, trained with TF2

16 Jul 18:31
53f4dd6
Compare
Choose a tag to compare

Pretrained en->tr and tr->en NMT models, trained with TF2

The models are re-trained using Tensorflow 2 with a larger training corpus compared to the prior release. Please this notebook for loading these checkpoints.

blended_dataset

10 Jul 19:30
7b4c4c0
Compare
Choose a tag to compare
blended_dataset Pre-release
Pre-release

Releasing a blended dataset created using a combination of four different parallel corpora. The preparation code is available here. The most dominant dataset is the Open Subtitles en/tr corpora, which is downsampled to 10% of its original size. The other datasets are used at 100%.

bianet_and_ted_corpora

05 Jul 05:31
Compare
Choose a tag to compare
Pre-release

Adding two additional corpora that are used in the latest version of the neural machine translation model.

pretrained_models

15 Jul 00:00
Compare
Choose a tag to compare

This consists of the pre-trained model weights for English->Turkish and Turkish->English translation models.
Documentation for loading these models will soon follow.

Raw data

06 Jul 18:40
Compare
Choose a tag to compare

The raw data used for training these NMT models. These data are downloaded from http://opus.nlpl.eu and included here for model reproducibility.