-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Segmentfault in multiprocessing DataLoader when training on Kunpeng cpu #2506
Comments
pr welcome |
MengqingCao
added a commit
to MengqingCao/wenet
that referenced
this issue
Apr 28, 2024
MengqingCao
added a commit
to MengqingCao/wenet
that referenced
this issue
Apr 28, 2024
MengqingCao
added a commit
to MengqingCao/wenet
that referenced
this issue
Apr 28, 2024
xingchensong
pushed a commit
that referenced
this issue
May 2, 2024
xingchensong
added a commit
that referenced
this issue
May 8, 2024
xingchensong
added a commit
that referenced
this issue
May 8, 2024
MengqingCao
added a commit
to MengqingCao/wenet
that referenced
this issue
May 15, 2024
MengqingCao
added a commit
to MengqingCao/wenet
that referenced
this issue
May 16, 2024
- fix segmentfault in Kunpeng (wenet-e2e#2506) - avoids the repeated initialization of deepspeed in (wenet-e2e#2507)
MengqingCao
added a commit
to MengqingCao/wenet
that referenced
this issue
May 16, 2024
- fix segmentfault in Kunpeng (wenet-e2e#2506) - avoids the repeated initialization of deepspeed causing by (wenet-e2e#2507)
xingchensong
pushed a commit
that referenced
this issue
May 17, 2024
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Describe the bug
Segmentfault occurs when the
train.py
is running. It happens when creating the multi-processes in DataLoader.the log:
The stack print out:
To Reproduce
Steps to reproduce the behavior:
cd ./examples/aishell/s0
bash run.sh
When go to stage 4 (run trian.py), the segmentfault will happen.Expected behavior
No fault.
Screenshots
Desktop (please complete the following information):
Additional context
I have confirmed that this error is caused by the way of creating multiple processes. Specifying the multi-process context as
spawn
, just setmultiprocessing_context=mp.get_context("spawn")
in DataLoader, can solve the problem. And as far as I know, the method spawn works on the most systems (Windows, all POSIX platforms and macOS):If this solution is approved, I will submit a PR. Let me know if you have any suggestion.
The text was updated successfully, but these errors were encountered: