Skip to content

fix CUDA_ERROR_ILLEGAL_ADDRESS bug #63

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

1374839016
Copy link

I fix the memory access bug, which describe here #55 . I force cupy allocate memory on pytorch device.

fix CUDA_ERROR_ILLEGAL_ADDRESS
@sniklaus
Copy link
Owner

Huge thanks for bringing this up!

Could you provide some more technical details on how this makes a difference? Currently, all the involved tensors will be on the same device as the first input as per:

rbot0 = one.new_zeros([ one.shape[0], one.shape[2] + 8, one.shape[3] + 8, one.shape[1] ])
rbot1 = one.new_zeros([ one.shape[0], one.shape[2] + 8, one.shape[3] + 8, one.shape[1] ])
one = one.contiguous(); assert(one.is_cuda == True)
two = two.contiguous(); assert(two.is_cuda == True)
output = one.new_zeros([ one.shape[0], 81, one.shape[2], one.shape[3] ])

I am hence a little bit confused on what the proposed fix would change. 🤔

@1374839016
Copy link
Author

Sorry, I don't know, but I guess the code allocate shared memory on default device(GPU 0).

cupy_launch('kernel_Correlation_updateOutput', cupy_kernel('kernel_Correlation_updateOutput', {
    'rbot0': rbot0,
    'rbot1': rbot1,
    'top': output
}))(
    grid=tuple([ output.shape[3], output.shape[2], output.shape[0] ]),
    block=tuple([ 32, 1, 1 ]),
    shared_mem=one.shape[1] * 4,
    args=[ cupy.int32(n), rbot0.data_ptr(), rbot1.data_ptr(), output.data_ptr() ]
)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants