Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

在nvidia A4000显卡上无法训练 #51

Open
kawais opened this issue Feb 23, 2022 · 0 comments
Open

在nvidia A4000显卡上无法训练 #51

kawais opened this issue Feb 23, 2022 · 0 comments

Comments

@kawais
Copy link

kawais commented Feb 23, 2022

使用命令python train.py --dataset Hayao --epoch 101 --init_epoch 10
训练过程中提示错误 failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED。
这个要怎么解决?

软件版本:

packages in environment at C:\ProgramData\Anaconda3\envs\py36:

Name Version Build Channel

absl-py 1.0.0 pypi_0 pypi
astor 0.8.1 pypi_0 pypi
cached-property 1.5.2 pypi_0 pypi
certifi 2021.5.30 py36haa95532_0
colorama 0.4.4 pypi_0 pypi
cudatoolkit 10.0.130 0
cudnn 7.6.0 cuda10.0_0
dataclasses 0.8 pypi_0 pypi
gast 0.2.2 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
grpcio 1.44.0 pypi_0 pypi
h5py 3.1.0 pypi_0 pypi
importlib-metadata 4.8.3 pypi_0 pypi
keras-applications 1.0.8 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi
markdown 3.3.6 pypi_0 pypi
numpy 1.19.5 pypi_0 pypi
opencv-python 4.5.5.62 pypi_0 pypi
opt-einsum 3.3.0 pypi_0 pypi
pip 21.2.2 py36haa95532_0
protobuf 3.19.4 pypi_0 pypi
python 3.6.2 h09676a0_15
setuptools 58.0.4 py36haa95532_0
six 1.16.0 pypi_0 pypi
tensorboard 1.15.0 pypi_0 pypi
tensorflow-estimator 1.15.1 pypi_0 pypi
tensorflow-gpu 1.15.0 pypi_0 pypi
termcolor 1.1.0 pypi_0 pypi
tqdm 4.62.3 pypi_0 pypi
typing-extensions 4.1.1 pypi_0 pypi
vc 14.2 h21ff451_1
vs2015_runtime 14.27.29016 h5e58377_2
werkzeug 2.0.3 pypi_0 pypi
wheel 0.37.1 pyhd3eb1b0_0
wincertstore 0.2 py36h7fe50ca_0
wrapt 1.13.3 pypi_0 pypi
zipp 3.6.0 pypi_0 pypi

错误日志:
2022-02-23 11:58:11.480753: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
2022-02-23 11:58:11.483857: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
2022-02-23 11:58:11.483905: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A7DCBCE8; size: 8388608; pattern: ffffffff
2022-02-23 11:58:11.486156: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A760C038; size: 8388608; pattern: ffffffff
2022-02-23 11:58:11.490415: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A760C058; size: 8388608; pattern: ffffffff
2022-02-23 11:58:11.488390: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A7DCBD08; size: 8388608; pattern: ffffffff
2022-02-23 11:58:11.493417: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:1006 : Not found: No algorithm worked!
2022-02-23 11:58:11.496159: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:1006 : Not found: No algorithm worked!
2022-02-23 11:58:11.497784: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A760C038; size: 8388608; pattern: ffffffff
2022-02-23 11:58:11.503459: I tensorflow/stream_executor/stream.cc:4976] [stream=000001E1AF327560,impl=000001E1A391B2F0] did not memset GPU location; source: 000000B3A760C058; size: 8388608; pattern: ffffffff
2022-02-23 11:58:11.505625: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:1006 : Not found: No algorithm worked!
2022-02-23 11:58:11.844846: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at conv_ops.cc:1006 : Not found: No algorithm worked!
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 1365, in _do_call
return fn(*args)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 1350, in _run_fn
target_list, run_metadata)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(786432, 3), b.shape=(3, 3), m=786432, n=3, k=3
[[{{node Tensordot/MatMul}}]]
[[mul_10/_893]]
(1) Internal: Blas GEMM launch failed : a.shape=(786432, 3), b.shape=(3, 3), m=786432, n=3, k=3
[[{{node Tensordot/MatMul}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 101, in
main()
File "train.py", line 96, in main
gan.train()
File "E:\AnimeGANv2-master\AnimeGANv2.py", line 248, in train
self.Generator_loss, self.G_loss_merge], feed_dict = train_feed_dict)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
run_metadata_ptr)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
run_metadata)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(786432, 3), b.shape=(3, 3), m=786432, n=3, k=3
[[node Tensordot/MatMul (defined at C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
[[mul_10/_893]]
(1) Internal: Blas GEMM launch failed : a.shape=(786432, 3), b.shape=(3, 3), m=786432, n=3, k=3
[[node Tensordot/MatMul (defined at C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\framework\ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'Tensordot/MatMul':
File "train.py", line 101, in
main()
File "train.py", line 91, in main
gan.build_model()
File "E:\AnimeGANv2-master\AnimeGANv2.py", line 155, in build_model
t_loss = self.con_weight * c_loss + self.sty_weight * s_loss + color_loss(self.real,self.generated) * self.color_weight + tv_loss
File "E:\AnimeGANv2-master\tools\ops.py", line 280, in color_loss
con = rgb2yuv(con)
File "E:\AnimeGANv2-master\tools\ops.py", line 309, in rgb2yuv
return tf.image.rgb_to_yuv(rgb)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\ops\image_ops_impl.py", line 2930, in rgb_to_yuv
return math_ops.tensordot(images, kernel, axes=[[ndims - 1], [0]])
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\ops\math_ops.py", line 4071, in tensordot
ab_matmul = matmul(a_reshape, b_reshape)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\util\dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\ops\math_ops.py", line 2754, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\ops\gen_math_ops.py", line 6136, in mat_mul
name=name)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\framework\op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "C:\ProgramData\Anaconda3\envs\py36\lib\site-packages\tensorflow_core\python\framework\ops.py", line 1748, in init
self._traceback = tf_stack.extract_stack()

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant