Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Robustness: Add timeouts to loops #102

Open
fhaverkamp opened this issue Jan 16, 2018 · 1 comment
Open

Robustness: Add timeouts to loops #102

fhaverkamp opened this issue Jan 16, 2018 · 1 comment

Comments

@fhaverkamp
Copy link
Contributor

fhaverkamp commented Jan 16, 2018

We recently found a case where we called cxl_afu_free() while some operations were still in progress. This was a timeout issue where we did not see completion after a certain amount of time. So as result we liked to release the afu, print an error message and go out.

Unfortunately libcxl was not willing to let us go and ended up looping in the following code:

	while (afu->attached)	/*infinite loop */
		_delay_1ms();

Since the code which seemed to be responsible to modify afu->attached was not working as expected, we think that it waited for an ACK from the hardware which was not really coming due to incomplete implementation, the pslse variant of libcxl was stuck and therefore our application too. Only gdb attaching to the hanging process helped to figure that out.

So could you please have a look in your code where there are loops without an emergency exit condition and consider fixing those up, such we do not see those hang conditions anymore? I am aware that this is an error condition but still thing we should handle it as smooth as possible. Thanks.

@umarajag
Copy link
Collaborator

@fhaverkamp , we have added a 3m timeout at the location. This is now part of the HEAD checkout. Can you pls. check whether this solves the issue?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants