Skip to content

Jobs on GPU #230

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 10 commits into from
Jan 2, 2020
Merged

Jobs on GPU #230

merged 10 commits into from
Jan 2, 2020

Conversation

hhsecond
Copy link

@hhsecond hhsecond commented Sep 27, 2019

  • CircleCI config for running GPU jobs
  • Test cases running on GPU Device

I am not using any of the existing build utilities such as system-setup.py since relying on a file inside the repo trigger rebuilding of that layer on each push which is time-consuming. We need to iterate over this and other build processes to unify the build processes and to do the packaging and release of GPU docker image etc

Also, because of the allow growth feature in TF, the test cases on GPU takes a lot of time now

@hhsecond hhsecond force-pushed the gputests branch 3 times, most recently from 46366ff to 1073b8c Compare October 2, 2019 08:13
@hhsecond hhsecond changed the title [WIP] Jobs on GPU Jobs on GPU Oct 2, 2019
@hhsecond hhsecond requested review from lantiga and rafie October 2, 2019 08:16
lantiga
lantiga previously approved these changes Oct 2, 2019
Copy link
Contributor

@lantiga lantiga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
Now that we know this works, we should indeed consider moving the installation of deps to system_setup.py to get non-dockerized setups for free and for consistency with the rest of the configurations.

@hhsecond
Copy link
Author

hhsecond commented Oct 3, 2019

@lantiga @rafie I agree with you on moving the dependency installations to system-setup.py but the fact that docker layer caching doesn't work on ADD for some reason causes all of the dependency installation to reruns on all the builds. This seems inefficient for me especially since GPU resources are costly. Let's see if CircleCI folks have some idea about this and I'll move to system-setup anyways

@hhsecond
Copy link
Author

hhsecond commented Oct 3, 2019

@lantiga @rafie
I have removed all the dependency installations from the docker file and relying on system-setup.py for most of the part. But dependencies required for the Redis installations are still in the docker file. It is because docker layer caching we had enabled wasn't working as expected and putting those dependencies also to system-setup will cause the job to take quite a lot of time to finish. I have raised a complaint with them already. Let see what they have got.
Also, the GPU backend is disabled in circle CI from today and they are going to send us a contract soon.
LET'S NOT MERGE THIS TILL THE GPU SUPPORT IS BACK

@lantiga
Copy link
Contributor

lantiga commented Oct 4, 2019

@hhsecond Thank you Sherin, great job. Given the costs associated with building on GPU, it's really important we don't waste precious compute minutes building.

@hhsecond hhsecond requested review from rafie and lantiga December 30, 2019 15:39
lantiga
lantiga previously approved these changes Dec 31, 2019
Copy link
Contributor

@lantiga lantiga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@lantiga
Copy link
Contributor

lantiga commented Dec 31, 2019

(@hhsecond please address Rafi’s comments before merging)

@lantiga
Copy link
Contributor

lantiga commented Jan 2, 2020

Hey @hhsecond, are we good to go?

@hhsecond hhsecond merged commit 7d99147 into master Jan 2, 2020
@hhsecond hhsecond deleted the gputests branch January 2, 2020 10:27
@hhsecond hhsecond mentioned this pull request Jan 2, 2020
lantiga pushed a commit that referenced this pull request May 6, 2020
 
Jobs on GPU (#230)

- GPU build on CI
- ORT CUDA compatibility issue fix
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants