Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

An illegal memory access was encountered, CUDA8, Ubuntu 16.04, Quadro 600, XMRig-NVIDIA/2.14.1 #251

Open
adsk1 opened this issue Mar 13, 2019 · 5 comments

Comments

@adsk1
Copy link

adsk1 commented Mar 13, 2019

Start from 2.12...I encounter the illegal memory error as follow, Anyone can help? Thx!!

  • ABOUT XMRig-NVIDIA/2.14.1 gcc/5.4.0
  • LIBS libuv/1.8.0 CUDA/8.0
  • CPU Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz x64 -AES
  • GPU #0 PCI:0000:02:00 Quadro 600 @ 1280/800 MHz 64x4 0x0 arch:21 SMX:2
  • ALGO cryptonight, donate=5%
  • POOL please add the amd version #1 127.0.0.1:5567 variant auto
  • COMMANDS hashrate, health, pause, resume
    [2019-03-13 10:06:07] use pool 127.0.0.1:5567 127.0.0.1
    [2019-03-13 10:06:07] new job from 127.0.0.1:5567 diff 120001 algo cn/r height 1789516
    [CUDA] Error gpu 0: <cryptonight_core_gpu_hash>:740 "an illegal memory access was encountered"
    terminate called after throwing an instance of 'std::runtime_error'
    what(): [CUDA] Error: an illegal memory access was encountered
    Aborted (core dumped)
@Spudz76
Copy link
Contributor

Spudz76 commented Mar 13, 2019

What driver version, you should be running whatever the last version that contained CUDA 8.0 (after 375 but before 384)

Otherwise the CUDA 8.0 Toolkit is running against backward compatibility code in the driver which is not ideal at all and can lead to errors such as this.

@adsk1
Copy link
Author

adsk1 commented Mar 14, 2019

Hi @Spudz76 , thx for your advise. My driver version is 384.111... is that ok?

Thu Mar 14 10:51:28 2019

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro 600 Off | 00000000:02:00.0 Off | N/A |
| 37% 54C P0 N/A / N/A | 0MiB / 964MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Sun_Sep__4_22:14:01_CDT_2016
Cuda compilation tools, release 8.0, V8.0.44

@Spudz76
Copy link
Contributor

Spudz76 commented Mar 18, 2019

I believe you've got CUDA 8.0 Toolkit first release there is a second updated one V8.0.63 or such (GA2) which is what I use. Unsure what the older one does

384 contains CUDA 9.0 so no that's not ideal. You're actually making it worse by running the even older 8.0 against two-steps-newer driver, double backward compatibility. More gaps for bugs.

You need to revert to 375.

Linux driver version versus what CUDA runtime it contains listed here

You want the GA2 driver and uninstall 8.0 then install 8.0-GA2 toolkit.
You probably won't get the 367 driver to work and match your toolkit plus that's known-bad CUDA version otherwise they wouldn't have released a GA2 (the only time they've done so at any version)

Also autotune is broken for Fermi now that I've been running it on mine, it will provide too much thread X blocks and crash trying to allocate memory (that exact error message). Especially for CN-R which has a random recompile in it which uses some more GPU memory (and isn't accounted for in the sizer for auto tuning)

I hunted settings for a while and got this as best for CN-R:

            "threads": 10,
            "blocks": 40,
            "bfactor": 6,
            "bsleep": 25,
            "sync_mode": 3,

Other algos required different layout but these work good for regular CN variants. Some wanted 8 threads, I think try to stick with multiple of SMX (which is 2)
otherwise tune blocks down until it quits failing
but for example I don't think I got anything to work with heavy other than 4x4 which is slow and dumb

@Spudz76
Copy link
Contributor

Spudz76 commented Mar 20, 2019

Okay, I've repaired some of this with #255 which eliminates the startup crashing ("unknown error")
If you can, please grab that PR and build it and see if it reduces the problems

I found it impossible to guess whether bugs-crash or clocking-crash they all look the same from the console...
Now that I'm running this patch it no longer crashes at all, and I can clock somewhat closer to what I used to run before CN-R was involved before anything starts failing (normal).

@Spudz76
Copy link
Contributor

Spudz76 commented Mar 20, 2019

Also CN-GPU hates the above 10x40 6x25 combo it only really works on RWZ and maybe 0/1/2 old variants

12x26 10x25 works as best as I've found at 31.5H/s any more it crashes. I think it saturates the floating point unit (which these have less # of) before it hits the usual limits, thus the smaller blocks count.

And everything hates less than 8 bfactor, some coins want 10.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants