-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
fallocate is interrupted by signal at startup #1368
Comments
@MatiasElo Do you have any comments for this ? |
Hmm, this is the first time I've seen this failure. Does this happen constantly or was it a random occurrence? Also, what was the return code of |
The error occurs easily on k8s env, 10% recurrence. I think fallocate return core is EINTR(Interrupted system call)。 Size is around 4M |
Thanks for the info. Looks like a good solution would be to add a number of retries if EINTR is received. |
Does this change fix the issue you are seeing? |
strange that the issue is not reproduced after I recompile...update later.... |
Update:
The other issue, similar to this is that I sometimes meet SIGSEGV in dpdk which is called odp_pktio_start() at startup. Thanks. |
Hmm, I haven't had to trace signals before, so unfortunately I cannot help much. Usually I just isolate the data plane cores and redirect all signals to a set of control cores. One thing which pops out in your log is |
fallocate() (and ftruncate()) may fail due to system interrupts, so retry the operation FALLOCATE_RETRIES times. Fixes: OpenDataPlane#1368 Signed-off-by: Matias Elo <matias.elo@nokia.com> Reported-and-tested-by: Christian Hong <guochun.hgc@alibaba-inc.com>
fallocate() (and ftruncate()) may fail due to system interrupts, so retry using TEMP_FAILURE_RETRY macro. Fixes: OpenDataPlane#1368 Signed-off-by: Matias Elo <matias.elo@nokia.com> Reported-and-tested-by: Christian Hong <guochun.hgc@alibaba-inc.com>
A pool create failed issue is detected in our system, error shows the system call fallocate is interruptted:
"odp_ishm.c:707:create_file():Huge page memory allocation failed: fd=582, file=/dev/hugepages/0/odp-16-ishm-pool_008_pkt-rx:7-0, err="Interrupted system call""
Is that better to retry the system call after getting the error return ?
While the signal is raised is unknown yet...
The text was updated successfully, but these errors were encountered: