Skip to content

[BUG] JSMA massive gpu memory consumption #187

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
Dontoronto opened this issue Jun 10, 2024 · 9 comments
Closed

[BUG] JSMA massive gpu memory consumption #187

Dontoronto opened this issue Jun 10, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@Dontoronto
Copy link

✨ Short description of the bug [tl;dr]

today i tried to run jsma on an imagenet sample. Sample had the shape (1,3,224,224). JSMA code stuck a little bit in the approximation and then an error message popped up writing "JSMA needs to allocate 84,.. GiB of gpu memory" while my nvidia only had 6gb.
When looking into the code i could see a lot of clones, inits etc. which costs a lot of memory, computation device transfers etc. I think some smarter guys than me could be able to optimize the code to work on lower memory consumption.

💬 Detailed code and results

Traceback (most recent call last):
File "C:\Users\Domin\anaconda3\envs\NeuronalNetwork\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\Domin\anaconda3\envs\NeuronalNetwork\lib\site-packages\torchattacks\attacks\jsma.py", line 116, in saliency_map
alpha = target_tmp.view(-1, 1, nb_features) + target_tmp.view(
File "C:\Users\Domin\anaconda3\envs\NeuronalNetwork\lib\site-packages\torch\utils_device.py", line 78, in torch_function
return func(*args, **kwargs)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 84.41 GiB. GPU 0 has a total capacity of 6.00 GiB of which 4.21 GiB is free. Of the allocated memory
704.63 MiB is allocated by PyTorch, and 29.37 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLO
C_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-vari
ables)

@Dontoronto Dontoronto added the bug Something isn't working label Jun 10, 2024
@rikonaka
Copy link
Contributor

rikonaka commented Jun 11, 2024

Hi @Dontoronto , have you tested your NVIDIA device for other attacks such as PGD or CW? Based on the fact that you're trying to attack imagenet on your only 6GB device, so I'm not quite sure if it's because your GPU is too tiny or if it's a problem with the code.

@Dontoronto
Copy link
Author

yes i tested it. currently I'm running deepfool and pgd attacks without any problems. the problem occures at this line
File "C:\Users\Domin\anaconda3\envs\NeuronalNetwork\lib\site-packages\torchattacks\attacks\jsma.py", line 116, in saliency_map alpha = target_tmp.view(-1, 1, nb_features) + target_tmp.view(

the variable nb_features has 150528 parameters inside, because of the flattened sample from imagenet. I don't know if this is really a bug or my setup is just too low.

@rikonaka
Copy link
Contributor

yes i tested it. currently I'm running deepfool and pgd attacks without any problems. the problem occures at this line File "C:\Users\Domin\anaconda3\envs\NeuronalNetwork\lib\site-packages\torchattacks\attacks\jsma.py", line 116, in saliency_map alpha = target_tmp.view(-1, 1, nb_features) + target_tmp.view(

the variable nb_features has 150528 parameters inside, because of the flattened sample from imagenet. I don't know if this is really a bug or my setup is just too low.

Roger that, I'm going to do some testing and debugging to try to find the problem and fix it! 😘

@Dontoronto
Copy link
Author

i would like to give more information but my computer is currently generating a deepfool dataset. Thank you very much!😃

@rikonaka
Copy link
Contributor

rikonaka commented Jun 12, 2024

i would like to give more information but my computer is currently generating a deepfool dataset. Thank you very much!😃

It seem that I have found the cause of the problem, due to an overly large dimension of the input tensor in the calculation of the Jacobi matrix.

def compute_jacobian(model, x):
    def model_forward(input):
        return model(input)
    jacobian = torch.autograd.functional.jacobian(model_forward, x)
    return jacobian

In the above code, even if I just input 3 images (from ImageNet), its GPU memory usage reaches 11G, and 5 => 16G, 6 => 36G.

5 images

So even if batch_size is set to 10, it still requires close to 80 GB+ of GPU memory on ImageNet dataset.

I'll try to improve the algorithm and try to make it work on ImageNet!

@Dontoronto
Copy link
Author

you are awesome! i really appreciate your effort :)

@rikonaka
Copy link
Contributor

rikonaka commented Jun 23, 2024

Hi @Dontoronto , on a bad note, I've been trying to reduce memory consumption on ImageNet for a while now and have rewritten the whole code for the JSMA attack 8c065ec, but I've found that this seems to be an unattainable goal.

Here are my reasons why.

First, according to the original JSMA attack paper, Algorithm 2 and Algorithm 3

Alg.2

The JSMA attack will try to travel all (p1, p2) pairs of tau, and the tau in ImageNet is 3 * 224 * 224, a very large number, so for p1 and p2, there will be (3 * 224 * 224)^2 combinations to look up. On a very small dataset, this lookup is possible, but on ImageNet, this lookup matrix will be unbelievably huge leading to a huge consumption of GPU memory.

Alg.3

Second, when computing SM (Saliency Map), we need to run the addition once for each element in the matrix, an operation that is O(n^2) memory consuming. You read that right, it is indeed O(n^2). I think that's kind of an inherent disadvantage of JSMA. 10 ImageNet images will be (10, 150528) => (10, 150528, 150528), backward propagation on such a large matrix is very memory intensive.

Eq9

Eq10

In the end, this is actually not bug, and if you are planning to run JSMA attacks on ImageNet, as my experimental equipment is not wireless, I tried to run it on a server with 150GB of RAM but couldn't succeed with these attacks, you can try a server with more than 200GB of RAM 😂. If you're successful remember to get back to me on how much RAM you ended up using on the server!

@Dontoronto
Copy link
Author

@rikonaka Sorry for causing you so much work. All things you mentioned sound plausible. I just stepped over this while generating samples for my thesis. Unfortunately I just have a 6gb gpu and can't use jsma for the imagenet case. I try to use OnePixel to get L0 attacks.
Thank you very much! Do I need to close this issue or do you close it? Idk if you still have something in your mind regarding this :)

@rikonaka
Copy link
Contributor

You can close this issues, if I have an update I'll comment below!👍

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants