CUDA/CuDNN related errors occur in Titan-RTX environments #39

dogyoonlee · 2020-09-10T13:02:28Z

hello.

I changed my environment in many ways,
but I couldn't get a solution for running your code...

First, my GPU is Titan-RTX
and my attempts are follows.

I also tried to run the code on CUDA 8.0 environments before, but the errors occurs as
almost same as on CUDA 9.0 environments

---environment---
ubuntu 18.04
CUDA 9.0
CuDNN 7.1
torch 0.3.1 / 0.4.0
==>
error message :
Found GPU0 TITAN RTX which requires CUDA_VERSION >= 9000 for
optimal performance and fast startup time, but your PyTorch was compiled
with CUDA_VERSION 8000. Please install the correct PyTorch binary
using instructions from http://pytorch.org

warnings.warn(incorrect_binary_warn % (d, name, 9000, CUDA_VERSION))

and process is "Killed" when data are load to the gpu, specifically operating conv2d() command in
55 line of pointnet2_modules.py, self.mlp[i] - _PointnetSAModuleBase function

---environment---
ubuntu 18.04
CUDA 9.0
CuDNN 7.1
torch 0.3.1 / 0.4.1
==>
error message :
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:663
---environment---
ubuntu 18.04
CUDA 9.0
CuDNN 7.1
torch 0.3.1 / 0.4.1

and I additionally revised train_cls.py as

torch.backends.cudnn.benchmark = False

==>
Traceback (most recent call last):
File "train_cls.py", line 217, in
main()
File "train_cls.py", line 125, in main
train(train_dataloader, test_dataloader, model, criterion, optimizer, lr_scheduler, bnm_scheduler, args, num_batch)
File "train_cls.py", line 167, in train
pred = model(points)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/SSD1/dogyoon/Relation-Shape-CNN-master/models/rscnn_ssn_cls.py", line 102, in forward
return self.FC_layer(features.squeeze(-1))
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/modules/batchnorm.py", line 66, in forward
exponential_average_factor, self.eps)
File "/home/mvpserverone/.conda/envs/rscnn/lib/python3.5/site-packages/torch/nn/functional.py", line 1251, in batch_norm
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size [1, 512]

I really hope to find the solution of this problem as soon as possible
thank you very much

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA/CuDNN related errors occur in Titan-RTX environments #39

CUDA/CuDNN related errors occur in Titan-RTX environments #39

dogyoonlee commented Sep 10, 2020

CUDA/CuDNN related errors occur in Titan-RTX environments #39

CUDA/CuDNN related errors occur in Titan-RTX environments #39

Comments

dogyoonlee commented Sep 10, 2020