You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, I can't reproduce the results, getting test accuracy around 60% (I didn't train for 100000 steps. But the curve seems already plateau). In particular, I'm not sure about three things:
In transformer_base, the default batch_size is 4096: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py#L1312
Unlike T2T's BabiQaConcat which inherits from TextProblem, this repo's BabiQaSentence inherits from Problem. There batch_size_means_tokens is set to False. So 4096 means a quite large batch (4096 * 70 * 12 = 3.4M tokens). I got OOM error with a Titan 1080 ti card. So I changed batch_size to 512.
In a 3 Sep commit, you changed default transformer_ffn_type from sepconv to fc. tensorflow/tensor2tensor@e496897
Should I use sepconv to run the experiments?
T2T code has undergone many changes since this repo was out. Will that impact the results?
What actual batch_size did you use? Did you change any other hparams when running t2t-datagen and t2t-trainer?
It would be very helpful if you could share your flags.txt, flags_t2t.txt and hparams.json files. Attached are mine.
I'm trying to reproduce the bAbI joint training results in the Universal Transformer paper (UT w/o ACT). My scripts are:
However, I can't reproduce the results, getting test accuracy around 60% (I didn't train for 100000 steps. But the curve seems already plateau). In particular, I'm not sure about three things:
In transformer_base, the default batch_size is 4096:
https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py#L1312
Unlike T2T's BabiQaConcat which inherits from TextProblem, this repo's BabiQaSentence inherits from Problem. There batch_size_means_tokens is set to False. So 4096 means a quite large batch (4096 * 70 * 12 = 3.4M tokens). I got OOM error with a Titan 1080 ti card. So I changed batch_size to 512.
In a 3 Sep commit, you changed default transformer_ffn_type from sepconv to fc.
tensorflow/tensor2tensor@e496897
Should I use sepconv to run the experiments?
T2T code has undergone many changes since this repo was out. Will that impact the results?
What actual batch_size did you use? Did you change any other hparams when running t2t-datagen and t2t-trainer?
It would be very helpful if you could share your flags.txt, flags_t2t.txt and hparams.json files. Attached are mine.
flags.txt
flags_t2t.txt
hparams.json.txt
The text was updated successfully, but these errors were encountered: