Randomness introduced when loading cache files in JSON #13

ZhouYzzz · 2019-04-21T04:49:39Z

I have been troubled by the randomness of ImageNetVID, and finally found the reason. In some versions of python, e.g. python 3.5, the cache files in JSON format will be unorderly loaded. This won't happen when we use python 2.7 or python 3.6. This greatly prevented me from reproducing my experiments, since the random order of the training data between runs will lead to different gradients in early epochs when training SiamFC. I suggest we may cache the dataset in a more stable way, e.g. using numpy or cpickle, and using OrderedDict or else.

Details:

import json
seq_dict = json.load(open('imagenet_vid_train/seq_dict.json', 'r'))
seq_names = [n for n in seq_dict]
print(seq_names[0])

gives ILSVRC2015_train_00000000.0 twice when using python 3.6 (completely in order),
gives ILSVRC2015_train_00646001.0 twice when using python 2.7 (not in order but repeatable),
but gives ILSVRC2015_train_00053009.1 and ILSVRC2015_train_00047000.2 when using python 3.5.

The text was updated successfully, but these errors were encountered:

ZhouYzzz · 2019-04-21T04:56:55Z

This behavior cannot be controlled through manually seeding, this is what makes me uncomfortable. I think everything should be in order before e.g. torch.DataLoader, or at least repeatable under the same seed.

ZhouYzzz · 2019-04-21T05:16:19Z

toolkit/got10k/datasets/vid.py

Line 41 in c5d3cb1

self.seq_dict = self._cache_meta()

A simple solution is to change this line to
self.seq_dict = OrderedDict(sorted(self._cache_meta().items()))

huanglianghua · 2019-04-22T05:17:11Z

Thanks for reporting the randomness issue and also proposing a solution. Using OrderedDict is a great idea for reproducibility. We'll add this feature in our later revision.

huanglianghua · 2019-04-23T04:58:45Z

Hi @ZhouYzzz, the seq_dict is already an OrderedDict in the old implementation. We have fixed the randomness by replace line 73 to

seq_dict = json.load(f, object_pairs_hook=OrderedDict)

You could access the revision using pip install --upgrade git+https://github.com/got-10k/toolkit.git@master.

ZhouYzzz · 2019-04-23T05:05:52Z

Hi @ZhouYzzz, the seq_dict is already an OrderedDict in the old implementation. We have fixed the randomness by replace line 73 to
seq_dict = json.load(f, object_pairs_hook=OrderedDict)
You could access the revision using pip install --upgrade git+https://github.com/got-10k/toolkit.git@master.

That is great, thank you!

huanglianghua closed this as completed Apr 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Randomness introduced when loading cache files in JSON #13

Randomness introduced when loading cache files in JSON #13

ZhouYzzz commented Apr 21, 2019 •

edited

Loading

ZhouYzzz commented Apr 21, 2019

ZhouYzzz commented Apr 21, 2019

huanglianghua commented Apr 22, 2019

huanglianghua commented Apr 23, 2019

ZhouYzzz commented Apr 23, 2019

Randomness introduced when loading cache files in JSON #13

Randomness introduced when loading cache files in JSON #13

Comments

ZhouYzzz commented Apr 21, 2019 • edited Loading

ZhouYzzz commented Apr 21, 2019

ZhouYzzz commented Apr 21, 2019

huanglianghua commented Apr 22, 2019

huanglianghua commented Apr 23, 2019

ZhouYzzz commented Apr 23, 2019

ZhouYzzz commented Apr 21, 2019 •

edited

Loading