-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
CUDA error out of memory #2001
Comments
Here is more error info: . File "C:\Users\Sahay\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\ops\script_ops.py", line 273, in call File "C:\Users\Sahay\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\ops\script_ops.py", line 151, in call File "C:\Users\Sahay\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\ops\script_ops.py", line 158, in _call File "C:\Users\Sahay\anaconda3\envs\sleap\lib\site-packages\tensorflow\python\autograph\impl\api.py", line 649, in wrapper File "C:\Users\Sahay\anaconda3\envs\sleap\lib\site-packages\sleap\nn\data\providers.py", line 405, in py_fetch_frame File "C:\Users\Sahay\anaconda3\envs\sleap\lib\site-packages\sleap\io\video.py", line 1104, in get_frame File "C:\Users\Sahay\anaconda3\envs\sleap\lib\site-packages\sleap\io\video.py", line 496, in get_frame KeyError: "Unable to load frame 26621 from MediaVideo(filename='D:/nate/SLEAP retroorbital injected/Split videos/Cage_1_part1.avi', grayscale=True, bgr=True, dataset='', input_format='')."
Process return code: 1 |
Hi @ngreen123, Can you provide the command you ran to get this error? I can't tell what your intended goal was. It seems like you have two issues. The first is an out of memory issue. Despite having 20 GB of GPU, it looks like you need ~34 GB to train with your given hyperparameters. Can you provide these hyperparameters (the contents of the config file to train or the model)? The number of frames is not as important as the batch size, or the image size since we train and perform inference in batches. If you are training a model or running inference you can decrease the batch size to decrease the amount of GPU memory used. When you are training, you can decrease the input scale of the input image to decrease the resolution of each frame. The second issue is that one of your frames cannot be loaded. This frame maybe corrupted. If you can reencode the video, or save the video in a different file format from the original frames, that could solve this issue.
Best, Elizabeth |
Hi @eberrigan, thanks for getting back to us! Here's our command line once I've initiated inf, and I've attached a screenshot of our parameters: Started inference at: 2024-10-22 16:40:34.051479 INFO:sleap.nn.inference:Auto-selected GPU 0 with 19698 MiB of free memory. System: ####### |
Hi @ngreen123,
Thanks! Elizabeth |
Hi,
Thanks again! |
Please take a look at the examples here https://sleap.ai/guides/cli.html#sleap-track. You can run tracking without inference if the predictions file is specified and no models are specified. So does the inference with tracking complete when using an mp4? |
Great! Thank you I tried it out with tracking, and using the same tracking criteria as before, I was getting memory issues, however when I bumped the elapsed frame window down to 2 I get almost no memory warnings or errors! |
Yay! Please let us know if you have any more issues. I will mark this issue as done. |
#CUDA error out of memory despite having 20 GB GPU
<We are running a 27K frame video and recieving error messages saying:
2024-10-21 17:21:44.237412: W .\tensorflow/core/common_runtime/device/device_host_allocator.h:46] could not allocate pinned host memory of size: 34359738368
2024-10-21 17:21:44.237751: E tensorflow/stream_executor/cuda/cuda_driver.cc:802] failed to alloc 34359738368 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
despite having a 20GB GPU. We were using smaller videos before, of only about 50 frames to trial, however still had these errors pop up (although only a few compared to the ~1000 we are getting now. the video finally finishes, however a type 1 error occurs without labeling the frames. We've noticed that if we dont analyze the last frame of the video, while we still get the memory errors, the type 1 error doesnt happen.>
Expected behaviour
Actual behaviour
Your personal set up
<windows 11, intel(R) Xeon(R) w3-2423 processor, NVIDIA RTX 4000 ada Generation 20 GB>
[SLEAP v1.3.3, python 3.7.12] --->
Environment packages
packages in environment at C:\Users\Sahay\anaconda3\envs\sleap:
Name Version Build Channel
absl-py 1.0.0 pypi_0 pypi
astunparse 1.6.3 pypi_0 pypi
attrs 21.4.0 pyhd8ed1ab_0 conda-forge
backports-zoneinfo 0.2.1 pypi_0 pypi
brotli 1.1.0 hcfcfb64_1 conda-forge
brotli-bin 1.1.0 hcfcfb64_1 conda-forge
ca-certificates 2024.8.30 h56e8100_0 conda-forge
cached-property 1.5.2 pypi_0 pypi
cachetools 4.2.4 pypi_0 pypi
cattrs 1.1.1 pyhd8ed1ab_0 conda-forge
certifi 2024.7.4 pyhd8ed1ab_0 conda-forge
charset-normalizer 2.0.9 pypi_0 pypi
cloudpickle 2.2.1 pyhd8ed1ab_0 conda-forge
cuda-nvcc 11.3.58 hb8d16a4_0 nvidia
cudatoolkit 11.3.1 hf2f0253_13 conda-forge
cudnn 8.2.1.32 h754d62a_0 conda-forge
cycler 0.11.0 pyhd8ed1ab_0 conda-forge
cytoolz 0.12.0 py37hcc03f2d_0 conda-forge
dask-core 2022.2.0 pyhd8ed1ab_0 conda-forge
efficientnet 1.0.0 pypi_0 pypi
flatbuffers 2.0 pypi_0 pypi
fonttools 4.38.0 py37h51bd9d9_0 conda-forge
freeglut 3.2.2 he0c23c2_3 conda-forge
freetype 2.12.1 hdaf720e_2 conda-forge
fsspec 2023.1.0 pyhd8ed1ab_0 conda-forge
gast 0.4.0 pypi_0 pypi
geos 3.11.0 h39d44d4_0 conda-forge
google-auth 2.3.3 pypi_0 pypi
google-auth-oauthlib 0.4.6 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
grpcio 1.43.0 pypi_0 pypi
h5py 3.1.0 pypi_0 pypi
hdmf 3.6.1 pypi_0 pypi
icu 69.1 h0e60522_0 conda-forge
idna 3.3 pypi_0 pypi
image-classifiers 1.0.0 pypi_0 pypi
imagecodecs-lite 2019.12.3 py37h0b711f8_5 conda-forge
imageio 2.35.1 pyh12aca89_0 conda-forge
imgaug 0.4.0 pyhd8ed1ab_1 conda-forge
imgstore 0.2.9 pypi_0 pypi
importlib-metadata 4.2.0 pypi_0 pypi
importlib-resources 5.12.0 pypi_0 pypi
intel-openmp 2024.2.1 h57928b3_1083 conda-forge
jasper 2.0.33 hc2e4405_1 conda-forge
joblib 1.3.2 pyhd8ed1ab_0 conda-forge
jpeg 9e hcfcfb64_3 conda-forge
jsmin 3.0.1 pyhd8ed1ab_0 conda-forge
jsonpickle 1.2 py_0 conda-forge
jsonschema 4.17.3 pypi_0 pypi
keras 2.7.0 pypi_0 pypi
keras-applications 1.0.8 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi
kiwisolver 1.4.4 py37h8c56517_0 conda-forge
lcms2 2.14 h90d422f_0 conda-forge
lerc 4.0.0 h63175ca_0 conda-forge
libblas 3.9.0 23_win64_mkl conda-forge
libbrotlicommon 1.1.0 hcfcfb64_1 conda-forge
libbrotlidec 1.1.0 hcfcfb64_1 conda-forge
libbrotlienc 1.1.0 hcfcfb64_1 conda-forge
libcblas 3.9.0 23_win64_mkl conda-forge
libclang 12.0.0 pypi_0 pypi
libdeflate 1.14 hcfcfb64_0 conda-forge
libhwloc 2.11.1 default_h8125262_1000 conda-forge
libiconv 1.17 hcfcfb64_2 conda-forge
liblapack 3.9.0 23_win64_mkl conda-forge
liblapacke 3.9.0 23_win64_mkl conda-forge
libopencv 4.5.5 py37h542666b_10 conda-forge
libpng 1.6.43 h19919ed_0 conda-forge
libprotobuf 3.20.3 h12be248_0 conda-forge
libsodium 1.0.18 h8d14728_1 conda-forge
libsqlite 3.46.0 h2466b09_0 conda-forge
libtiff 4.4.0 hc4f729c_5 conda-forge
libwebp-base 1.4.0 hcfcfb64_0 conda-forge
libxcb 1.13 hcd874cb_1004 conda-forge
libxml2 2.12.7 h0f24e4e_4 conda-forge
libxslt 1.1.39 h3df6e99_0 conda-forge
libzlib 1.3.1 h2466b09_1 conda-forge
locket 1.0.0 pyhd8ed1ab_0 conda-forge
m2w64-gcc-libgfortran 5.3.0 6 conda-forge
m2w64-gcc-libs 5.3.0 7 conda-forge
m2w64-gcc-libs-core 5.3.0 7 conda-forge
m2w64-gmp 6.1.0 2 conda-forge
m2w64-libwinpthread-git 5.0.0.4634.697f757 2 conda-forge
markdown 3.3.6 pypi_0 pypi
markdown-it-py 2.2.0 pyhd8ed1ab_0 conda-forge
matplotlib-base 3.5.3 py37hbaab90a_2 conda-forge
mdurl 0.1.2 pyhd8ed1ab_0 conda-forge
mkl 2024.1.0 h66d3029_694 conda-forge
msys2-conda-epoch 20160418 1 conda-forge
munkres 1.1.4 pyh9f0ad1d_0 conda-forge
ndx-pose 0.1.1 pypi_0 pypi
networkx 2.7 pyhd8ed1ab_0 conda-forge
nixio 1.5.3 pypi_0 pypi
numpy 1.19.5 pypi_0 pypi
oauthlib 3.1.1 pypi_0 pypi
opencv 4.5.5 py37h03978a9_10 conda-forge
opencv-python-headless 4.2.0.34 pypi_0 pypi
openjpeg 2.5.0 hc9384bd_1 conda-forge
openssl 1.1.1w hcfcfb64_0 conda-forge
opt-einsum 3.3.0 pypi_0 pypi
packaging 21.3 pypi_0 pypi
pandas 1.3.5 py37h9386db6_0 conda-forge
partd 1.4.1 pyhd8ed1ab_0 conda-forge
patsy 0.5.6 pyhd8ed1ab_0 conda-forge
pillow 9.2.0 py37h42a8222_2 conda-forge
pip 24.0 pyhd8ed1ab_0 conda-forge
pkgutil-resolve-name 1.3.10 pypi_0 pypi
protobuf 3.19.1 pypi_0 pypi
psutil 5.9.3 py37h51bd9d9_0 conda-forge
pthread-stubs 0.4 hcd874cb_1001 conda-forge
pthreads-win32 2.9.1 hfa6e2cd_3 conda-forge
py-opencv 4.5.5 py37h90c5f73_10 conda-forge
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pygments 2.17.2 pyhd8ed1ab_0 conda-forge
pykalman 0.9.7 pyhd8ed1ab_0 conda-forge
pynwb 2.3.3 pypi_0 pypi
pyparsing 3.0.6 pypi_0 pypi
pyrsistent 0.19.3 pypi_0 pypi
pyside2 5.13.2 py37h760f651_8 conda-forge
python 3.7.12 h7840368_100_cpython conda-forge
python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge
python-rapidjson 1.9 py37h7f67f24_0 conda-forge
python_abi 3.7 4_cp37m conda-forge
pytz 2024.1 pyhd8ed1ab_0 conda-forge
pywavelets 1.3.0 py37h3a130e4_1 conda-forge
pyyaml 6.0 py37hcc03f2d_4 conda-forge
pyzmq 24.0.1 py37h7347f05_0 conda-forge
qimage2ndarray 1.10.0 pypi_0 pypi
qt 5.12.9 h556501e_6 conda-forge
qtpy 2.4.1 pyhd8ed1ab_0 conda-forge
requests 2.26.0 pypi_0 pypi
requests-oauthlib 1.3.0 pypi_0 pypi
rich 13.7.1 pyhd8ed1ab_0 conda-forge
ruamel-yaml 0.17.32 pypi_0 pypi
ruamel-yaml-clib 0.2.7 pypi_0 pypi
scikit-image 0.19.2 py37h9386db6_0 conda-forge
scikit-learn 1.0 py37ha78be43_1 conda-forge
scikit-video 1.1.11 pyh24bf2e0_0 conda-forge
scipy 1.7.3 py37hb6553fb_0 conda-forge
seaborn 0.12.2 hd8ed1ab_0 conda-forge
seaborn-base 0.12.2 pyhd8ed1ab_0 conda-forge
segmentation-models 1.0.1 pypi_0 pypi
setuptools 59.8.0 py37h03978a9_1 conda-forge
setuptools-scm 6.3.2 pypi_0 pypi
shapely 1.8.5 py37h475e9a0_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
sleap 1.3.3 pypi_0 pypi
sqlite 3.46.0 h2466b09_0 conda-forge
statsmodels 0.13.2 py37h3a130e4_0 conda-forge
tbb 2021.12.0 hc790b64_4 conda-forge
tensorboard 2.7.0 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.0 pypi_0 pypi
tensorflow 2.7.0 pypi_0 pypi
tensorflow-estimator 2.7.0 pypi_0 pypi
tensorflow-hub 0.12.0 pyhca92ed8_0 conda-forge
tensorflow-io-gcs-filesystem 0.23.1 pypi_0 pypi
termcolor 1.1.0 pypi_0 pypi
threadpoolctl 3.1.0 pyh8a188c0_0 conda-forge
tifffile 2020.6.3 py_0 conda-forge
tk 8.6.13 h5226925_1 conda-forge
tomli 2.0.0 pypi_0 pypi
toolz 0.12.1 pyhd8ed1ab_0 conda-forge
typing-extensions 4.0.1 pypi_0 pypi
typing_extensions 4.7.1 pyha770c72_0 conda-forge
tzdata 2023.3 pypi_0 pypi
tzlocal 5.0.1 pypi_0 pypi
ucrt 10.0.22621.0 h57928b3_0 conda-forge
unicodedata2 14.0.0 py37hcc03f2d_1 conda-forge
urllib3 1.26.7 pypi_0 pypi
vc 14.3 h8a93ad2_20 conda-forge
vc14_runtime 14.40.33810 hcc2c482_20 conda-forge
vs2015_runtime 14.40.33810 h3bf8584_20 conda-forge
werkzeug 2.0.2 pypi_0 pypi
wheel 0.42.0 pyhd8ed1ab_0 conda-forge
wrapt 1.13.3 pypi_0 pypi
xorg-libxau 1.0.11 hcd874cb_0 conda-forge
xorg-libxdmcp 1.1.3 hcd874cb_0 conda-forge
xz 5.2.6 h8d14728_0 conda-forge
yaml 0.2.5 h8ffe710_2 conda-forge
zeromq 4.3.4 h0e60522_1 conda-forge
zipp 3.15.0 pypi_0 pypi
zstd 1.5.6 h0ea2cb4_0 conda-forge
The text was updated successfully, but these errors were encountered: