Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Questions about compatible version of torchvision #4

Open
jp7c5 opened this issue Dec 17, 2020 · 2 comments
Open

Questions about compatible version of torchvision #4

jp7c5 opened this issue Dec 17, 2020 · 2 comments

Comments

@jp7c5
Copy link

jp7c5 commented Dec 17, 2020

Hello. Thanks for sharing this project.

I could install nimble following the installation guide.
It seems that the torch version is "1.4.0a0+61ec0ca".
To use torch with torchvision, I installed it by the following script (torchvision of CUDA 10.2)
pip install torchvision==0.5.0 -f https://download.pytorch.org/whl/cu102/torch_stable.html
and since this reinstalls different version of PyTorch, I removed PyTorch and rebuilt the nimble.
I'm curious whether this method is correct, but I could import both torch==1.4.0a0+61ec0ca and torchvision==0.5.0 anyway.

However, I'm having an error which seems to be related to torchvison. For example,
import torch
torch.ops.torchvision.nms
generates a runtime error
RuntimeError: No such operator torchvision::nms.

Since the example code in README uses torchvision, could you let me know how to install torchvision which is compatible with nimble?

@gyeongin
Copy link
Contributor

When we build PyTorch from source, we should also build torchvision from source because of the issue you've mentioned: pip-installing torchvision will reinstall different version of PyTorch.

You should:

  1. clone torchvision repo
  2. checkout to v0.5.0 tag (because torchvision v0.5.0 is the latest version compatible with PyTorch v1.4.1)
  3. run python setup.py install

Note that running torchvision's NMS operation with Nimble will have a problem.
Nimble is built for optimized GPU task scheduling, so the PyTorch module passed to Nimble should perform all computation on GPU.
However, torchvision's NMS implementation does not satisfy this constraint, as it performs some logic on CPU.

You can try these two options.

  1. Carve out "GPU-only" "static" part(s) from your PyTorch module, apply Nimble on those parts separately, and wire the resulting Nimble modules and the rest of your PyTorch module.
  2. Adopt GPU-only NMS implementation. TensorRT's batchedNMS and NMS plugin could be a good choice.

@jp7c5
Copy link
Author

jp7c5 commented Dec 18, 2020

Thanks for the quick reply.

By following your suggestion, I built torchvision from source and surprisingly, the error related to nms doesn't show up.
But still, I'm having the following error
AttributeError: module 'torch.distributed' has no attribute 'init_process_group'.
I saw #1 , so is this expected for the current status?

Without distributed setting, the default training code runs smoothly.
In the process of applying nimble for this single GPU setup, I noticed that the model to be wrapped by nimble should have strict input and output format (mostly comprising of torch Tensors).
I don't know if this a must, but if not, the relaxation of this condition would make nimble easier to use :)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants