Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Running Intro Notebook on WSL #122

Closed
ashleemilton opened this issue Aug 3, 2022 · 4 comments
Closed

Running Intro Notebook on WSL #122

ashleemilton opened this issue Aug 3, 2022 · 4 comments
Assignees

Comments

@ashleemilton
Copy link

Hello,

I am having issues trying to run the provided intro notebook for ColBERTv2. I am working in an anaconda environment, created using the commands provided, in a Ubuntu WSL virtual machine. I am running a single CUDA-compatible Nvidia GPU. When I try and index the collection with nranks=1, I encounter the error:
NCCL version 2.7.8
ncclSystemError: System call (socket, malloc, munmap, etc) failed.

I have tried tracking down the cause of the error but the only information I can find is that it could be due to trying to run it don't a single GPU. I am stuck on any fixes for this and would greatly appreciate any guidance on resolving the issue. Further, I am trying to ultimately fine-tune ColBERT so I am also interested in the response to issue #121.

@santhnm2 santhnm2 self-assigned this Aug 3, 2022
@santhnm2
Copy link
Collaborator

santhnm2 commented Aug 3, 2022

Hi @ashleemilton, I haven't tried working with CUDA through WSL but it seems like there are some setup steps that NVIDIA lists here in case you haven't seen these before: https://docs.nvidia.com/cuda/wsl-user-guide/index.html
But if you have seen those, are you able to successfully run any other GPU code besides ColBERT? For example does python -c "import torch; print(torch.cuda.is_available())" print True? Also could you post the output of conda list and nvidia-smi?

@ashleemilton
Copy link
Author

ashleemilton commented Aug 3, 2022

I did do the additional steps for working with CUDA through WSL and torch prints true from is_available(). My windows environment is Windows 11, in case that matters.

Here is the output from conda list in colbert environment:
Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_kmp_llvm conda-forge
anyio 3.6.1 pypi_0 pypi
argon2-cffi 21.3.0 pypi_0 pypi
argon2-cffi-bindings 21.2.0 pypi_0 pypi
attrs 21.4.0 pypi_0 pypi
babel 2.10.3 pypi_0 pypi
backcall 0.2.0 pypi_0 pypi
beautifulsoup4 4.11.1 pypi_0 pypi
bitarray 2.5.1 pypi_0 pypi
blas 2.115 mkl conda-forge
blas-devel 3.9.0 15_linux64_mkl conda-forge
bleach 5.0.1 pypi_0 pypi
blis 0.7.8 pypi_0 pypi
bzip2 1.0.8 h7f98852_4 conda-forge
ca-certificates 2022.6.15 ha878542_0 conda-forge
catalogue 2.0.7 pypi_0 pypi
certifi 2022.6.15 pypi_0 pypi
cffi 1.15.1 pypi_0 pypi
charset-normalizer 2.1.0 pypi_0 pypi
click 8.1.3 pypi_0 pypi
cudatoolkit 11.1.74 h6bb024c_0 nvidia
cupy-cuda110 10.6.0 pypi_0 pypi
cupy-cuda111 10.6.0 pypi_0 pypi
cymem 2.0.6 pypi_0 pypi
debugpy 1.6.0 pypi_0 pypi
decorator 5.1.1 pypi_0 pypi
defusedxml 0.7.1 pypi_0 pypi
entrypoints 0.4 pypi_0 pypi
faiss 1.7.0 py37cuda111hcc9d9d6_8_cuda conda-forge
faiss-gpu 1.7.0 h788eb59_8 conda-forge
fastjsonschema 2.15.3 pypi_0 pypi
fastrlock 0.8 pypi_0 pypi
ffmpeg 4.3 hf484d3e_0 pytorch
filelock 3.7.1 pypi_0 pypi
freetype 2.10.4 h0708190_1 conda-forge
gitdb 4.0.9 pypi_0 pypi
gitpython 3.1.27 pypi_0 pypi
gmp 6.2.1 h58526e2_0 conda-forge
gnutls 3.6.13 h85f3911_1 conda-forge
huggingface-hub 0.8.1 pypi_0 pypi
idna 3.3 pypi_0 pypi
importlib-metadata 4.12.0 pypi_0 pypi
importlib-resources 5.8.0 pypi_0 pypi
ipykernel 6.15.0 pypi_0 pypi
ipython 7.34.0 pypi_0 pypi
ipython-genutils 0.2.0 pypi_0 pypi
ipywidgets 7.7.1 pypi_0 pypi
jedi 0.18.1 pypi_0 pypi
jinja2 3.1.2 pypi_0 pypi
joblib 1.1.0 pypi_0 pypi
jpeg 9b h024ee3a_2
json5 0.9.8 pypi_0 pypi
jsonschema 4.6.1 pypi_0 pypi
jupyter 1.0.0 pypi_0 pypi
jupyter-client 7.3.4 pypi_0 pypi
jupyter-console 6.4.4 pypi_0 pypi
jupyter-core 4.10.0 pypi_0 pypi
jupyter-server 1.18.1 pypi_0 pypi
jupyterlab 3.4.3 pypi_0 pypi
jupyterlab-pygments 0.2.2 pypi_0 pypi
jupyterlab-server 2.15.0 pypi_0 pypi
jupyterlab-widgets 1.1.1 pypi_0 pypi
lame 3.100 h7f98852_1001 conda-forge
langcodes 3.3.0 pypi_0 pypi
ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge
libblas 3.9.0 15_linux64_mkl conda-forge
libcblas 3.9.0 15_linux64_mkl conda-forge
libfaiss 1.7.0 cuda111hf54f04a_8_cuda conda-forge
libfaiss-avx2 1.7.0 cuda111h1234567_8_cuda conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 12.1.0 h8d9b700_16 conda-forge
libgfortran-ng 12.1.0 h69a702a_16 conda-forge
libgfortran5 12.1.0 hdcd56e2_16 conda-forge
libiconv 1.17 h166bdaf_0 conda-forge
liblapack 3.9.0 15_linux64_mkl conda-forge
liblapacke 3.9.0 15_linux64_mkl conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libpng 1.6.37 h21135ba_2 conda-forge
libstdcxx-ng 12.1.0 ha89aaad_16 conda-forge
libtiff 4.0.9 he6b73bb_1 conda-forge
libuv 1.43.0 h7f98852_0 conda-forge
libzlib 1.2.12 h166bdaf_1 conda-forge
llvm-openmp 14.0.4 he0ac6c6_0 conda-forge
markupsafe 2.1.1 pypi_0 pypi
matplotlib-inline 0.1.3 pypi_0 pypi
mistune 0.8.4 pypi_0 pypi
mkl 2022.1.0 h84fe81f_915 conda-forge
mkl-devel 2022.1.0 ha770c72_916 conda-forge
mkl-include 2022.1.0 h84fe81f_915 conda-forge
murmurhash 1.0.7 pypi_0 pypi
nbclassic 0.4.2 pypi_0 pypi
nbclient 0.6.6 pypi_0 pypi
nbconvert 6.5.0 pypi_0 pypi
nbformat 5.4.0 pypi_0 pypi
ncurses 6.3 h27087fc_1 conda-forge
nest-asyncio 1.5.5 pypi_0 pypi
nettle 3.6 he412f7d_0 conda-forge
ninja 1.10.2.3 pypi_0 pypi
notebook 6.4.12 pypi_0 pypi
notebook-shim 0.1.0 pypi_0 pypi
numpy 1.21.6 py37h976b520_0 conda-forge
olefile 0.46 pyh9f0ad1d_1 conda-forge
openh264 2.1.1 h780b84a_0 conda-forge
openssl 3.0.5 h166bdaf_0 conda-forge
packaging 21.3 pypi_0 pypi
pandocfilters 1.5.0 pypi_0 pypi
parso 0.8.3 pypi_0 pypi
pathy 0.6.2 pypi_0 pypi
pexpect 4.8.0 pypi_0 pypi
pickleshare 0.7.5 pypi_0 pypi
pillow 5.4.1 py37h34e0f95_0
pip 21.0.1 pyhd8ed1ab_0 conda-forge
preshed 3.0.6 pypi_0 pypi
prometheus-client 0.14.1 pypi_0 pypi
prompt-toolkit 3.0.30 pypi_0 pypi
psutil 5.9.1 pypi_0 pypi
ptyprocess 0.7.0 pypi_0 pypi
pycparser 2.21 pypi_0 pypi
pydantic 1.8.2 pypi_0 pypi
pygments 2.12.0 pypi_0 pypi
pyparsing 3.0.9 pypi_0 pypi
pyrsistent 0.18.1 pypi_0 pypi
python 3.7.12 hf930737_100_cpython conda-forge
python-dateutil 2.8.2 pypi_0 pypi
python_abi 3.7 2_cp37m conda-forge
pytorch 1.9.0 py3.7_cuda11.1_cudnn8.0.5_0 pytorch
pytz 2022.1 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
pyzmq 23.2.0 pypi_0 pypi
qtconsole 5.3.1 pypi_0 pypi
qtpy 2.1.0 pypi_0 pypi
readline 8.1.2 h0f457ee_0 conda-forge
regex 2022.6.2 pypi_0 pypi
requests 2.28.1 pypi_0 pypi
sacremoses 0.0.53 pypi_0 pypi
scipy 1.7.3 pypi_0 pypi
send2trash 1.8.0 pypi_0 pypi
setuptools 63.1.0 py37h89c1867_0 conda-forge
six 1.16.0 pypi_0 pypi
smart-open 5.2.1 pypi_0 pypi
smmap 5.0.0 pypi_0 pypi
sniffio 1.2.0 pypi_0 pypi
soupsieve 2.3.2.post1 pypi_0 pypi
spacy 3.3.1 pypi_0 pypi
spacy-legacy 3.0.9 pypi_0 pypi
spacy-loggers 1.0.2 pypi_0 pypi
sqlite 3.39.0 h4ff8645_0 conda-forge
srsly 2.4.3 pypi_0 pypi
tbb 2021.5.0 h924138e_1 conda-forge
terminado 0.15.0 pypi_0 pypi
thinc 8.0.17 pypi_0 pypi
tinycss2 1.1.1 pypi_0 pypi
tk 8.6.12 h27826a3_0 conda-forge
tokenizers 0.10.3 pypi_0 pypi
torchaudio 0.9.0 py37 pytorch
torchvision 0.10.0 py37_cu111 pytorch
tornado 6.2 pypi_0 pypi
tqdm 4.64.0 pypi_0 pypi
traitlets 5.3.0 pypi_0 pypi
transformers 4.10.0 pypi_0 pypi
typer 0.4.2 pypi_0 pypi
typing-extensions 4.1.1 pypi_0 pypi
ujson 5.4.0 pypi_0 pypi
urllib3 1.26.9 pypi_0 pypi
wasabi 0.9.1 pypi_0 pypi
wcwidth 0.2.5 pypi_0 pypi
webencodings 0.5.1 pypi_0 pypi
websocket-client 1.3.3 pypi_0 pypi
wheel 0.37.1 pyhd8ed1ab_0 conda-forge
widgetsnbextension 3.6.1 pypi_0 pypi
xz 5.2.5 h516909a_1 conda-forge
zipp 3.8.0 pypi_0 pypi
zlib 1.2.12 h166bdaf_1 conda-forge

And nvidia-smi:
Wed Aug 3 13:46:07 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06 Driver Version: 516.59 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:3B:00.0 Off | N/A |
| N/A 38C P0 N/A / N/A | 75MiB / 4096MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

@santhnm2
Copy link
Collaborator

santhnm2 commented Aug 3, 2022

Ah it seems from this thread that single-gpu support on WSL is only available on NCCL version 2.10.3, and multi-gpu support is only available with NCCL version 2.11.4: NVIDIA/nccl#442
I ran python -c "import torch; print(torch.cuda.nccl_version())" in my conda environment and got (2, 10, 3) which seems to be more up-to-date than your version. And my PyTorch version that appears in conda list is pytorch 1.12.0 py3.8_cuda11.3_cudnn8.3.2_0 pytorch which is also a bit more recent (my understanding is that nccl comes pre-built with the PyTorch binaries). Just to confirm, are you using the main branch?

@santhnm2
Copy link
Collaborator

Closing due to inactivity, please re-open if this is still an issue.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants