Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

cudaErrorInvalidDeviceFunction: invalid device function #48

Open
illuosion opened this issue Aug 18, 2023 · 1 comment
Open

cudaErrorInvalidDeviceFunction: invalid device function #48

illuosion opened this issue Aug 18, 2023 · 1 comment

Comments

@illuosion
Copy link

hello, I tested the experiment by following the install guide,and I come across the problem about cuda

  • using 8 Tesla K80s [8,9,10,11,12,13,14,15,16]
    `[08/18 09:54:47 main-logger]: #Model parameters: 32311715
    [08/18 09:54:47 main-logger]: class_weight: tensor([ 3.1557, 8.7029, 7.8281, 6.1354, 6.3161, 7.9937, 8.9704, 10.1922,
    1.6155, 4.2187, 1.9385, 5.5455, 2.0198, 2.6261, 1.3212, 5.1102,
    2.5492, 5.8585, 7.3929], device='cuda:0')
    [08/18 09:54:47 main-logger]: loss_name: ce_loss
    [08/18 09:54:47 main-logger]: train_data samples: '19130'
    [08/18 09:54:47 main-logger]: val_data samples: '4071'
    [08/18 09:54:47 main-logger]: scheduler: Poly. scheduler_update: step
    [08/18 09:54:47 main-logger]: lr: [0.006, 0.0006000000000000001]
    [Exception|implicit_gemm_pair]indices=torch.Size([78654, 4]),bs=1,ss=[1977, 1756, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
    SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
    [Exception|implicit_gemm_pair]indices=torch.Size([80831, 4]),bs=1,ss=[1624, 2049, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
    SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
    [Exception|implicit_gemm_pair]indices=torch.Size([92342, 4]),bs=1,ss=[2048, 1388, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
    SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
    [Exception|implicit_gemm_pair]indices=torch.Size([82930, 4]),bs=1,ss=[2049, 2049, 129],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
    SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
    [Exception|implicit_gemm_pair]indices=torch.Size([73867, 4]),bs=1,ss=[2049, 1523, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
    SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
    [Exception|implicit_gemm_pair]indices=torch.Size([93205, 4]),bs=1,ss=[1718, 2048, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
    SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
    [Exception|implicit_gemm_pair]indices=torch.Size([88331, 4]),bs=1,ss=[2049, 2049, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
    SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
    [Exception|implicit_gemm_pair]indices=torch.Size([84985, 4]),bs=1,ss=[1975, 1676, 128],algo=ConvAlgo.MaskImplicitGemm,ksize=[3, 3, 3],stride=[1, 1, 1],padding=[1, 1, 1],dilation=[1, 1, 1],subm=True,transpose=False
    SPCONV_DEBUG_SAVE_PATH not found, you can specify SPCONV_DEBUG_SAVE_PATH as debug data save path to save debug data which can be attached in a issue.
    Traceback (most recent call last):
    File "train.py", line 902, in
    main()
    File "train.py", line 90, in main
    mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args))
    File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
    File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
    File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
    torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/fyy/SphereFormer-master/train.py", line 410, in main_worker
loss_train, mIoU_train, mAcc_train, allAcc_train = train(train_loader, model, criterion, optimizer, epoch, scaler, scheduler, gpu)
File "/home/fyy/SphereFormer-master/train.py", line 498, in train
output = model(sinput, xyz, batch)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyy/SphereFormer-master/model/unet_spherical_transformer.py", line 284, in forward
output = self.input_conv(input)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/spconv/pytorch/modules.py", line 137, in forward
input = module(input)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/spconv/pytorch/conv.py", line 404, in forward
raise e
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/spconv/pytorch/conv.py", line 395, in forward
timer=input._timer)
File "/home/fyy/anaconda3/envs/sphereformer/lib/python3.7/site-packages/spconv/pytorch/ops.py", line 359, in get_indice_pairs_implicit_gemm
mask_argsort_tv[j], stream)
RuntimeError: radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function`

@illuosion
Copy link
Author

I have searched for resolution but let me change the Pytorch version to 1.9.x,but it came into a new problem.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant