Parallelizing data load and training #68

mallela · 2019-02-07T00:46:08Z

Hello!

I read in another issue that you load data and perform training in parallel. I was just wondering how exactly you do that? Because the bottle neck does not seem to be training (takes ~0.06s) but data pre-processing/fetching call ( augmentation using imgaug Sequential process ~0.8s; loading .h5 ~0.2s). I am using a batch size of 120.

Are you using multiprocessing or the TF data input pipeline?

Thanks,
Praneeta

markus-hinsche · 2019-02-07T07:29:34Z

Praneeta!
In Tensorflow, the method dataset.map() has a parameter num_parallel_calls.

See how we use it in our training implementation of this paper:
https://github.com/merantix/imitation-learning/blob/master/imitation/input_fn.py#L100

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelizing data load and training #68

Parallelizing data load and training #68

mallela commented Feb 7, 2019

markus-hinsche commented Feb 7, 2019

Parallelizing data load and training #68

Parallelizing data load and training #68

Comments

mallela commented Feb 7, 2019

markus-hinsche commented Feb 7, 2019