Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Parallelizing data load and training #68

Open
mallela opened this issue Feb 7, 2019 · 1 comment
Open

Parallelizing data load and training #68

mallela opened this issue Feb 7, 2019 · 1 comment

Comments

@mallela
Copy link

mallela commented Feb 7, 2019

Hello!

I read in another issue that you load data and perform training in parallel. I was just wondering how exactly you do that? Because the bottle neck does not seem to be training (takes ~0.06s) but data pre-processing/fetching call ( augmentation using imgaug Sequential process ~0.8s; loading .h5 ~0.2s). I am using a batch size of 120.

Are you using multiprocessing or the TF data input pipeline?

Thanks,
Praneeta

@markus-hinsche
Copy link

Praneeta!
In Tensorflow, the method dataset.map() has a parameter num_parallel_calls.

See how we use it in our training implementation of this paper:
https://github.com/merantix/imitation-learning/blob/master/imitation/input_fn.py#L100

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants