You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently found a case where we called cxl_afu_free() while some operations were still in progress. This was a timeout issue where we did not see completion after a certain amount of time. So as result we liked to release the afu, print an error message and go out.
Unfortunately libcxl was not willing to let us go and ended up looping in the following code:
while (afu->attached) /*infinite loop */
_delay_1ms();
Since the code which seemed to be responsible to modify afu->attached was not working as expected, we think that it waited for an ACK from the hardware which was not really coming due to incomplete implementation, the pslse variant of libcxl was stuck and therefore our application too. Only gdb attaching to the hanging process helped to figure that out.
So could you please have a look in your code where there are loops without an emergency exit condition and consider fixing those up, such we do not see those hang conditions anymore? I am aware that this is an error condition but still thing we should handle it as smooth as possible. Thanks.
The text was updated successfully, but these errors were encountered:
We recently found a case where we called cxl_afu_free() while some operations were still in progress. This was a timeout issue where we did not see completion after a certain amount of time. So as result we liked to release the afu, print an error message and go out.
Unfortunately libcxl was not willing to let us go and ended up looping in the following code:
Since the code which seemed to be responsible to modify afu->attached was not working as expected, we think that it waited for an ACK from the hardware which was not really coming due to incomplete implementation, the pslse variant of libcxl was stuck and therefore our application too. Only gdb attaching to the hanging process helped to figure that out.
So could you please have a look in your code where there are loops without an emergency exit condition and consider fixing those up, such we do not see those hang conditions anymore? I am aware that this is an error condition but still thing we should handle it as smooth as possible. Thanks.
The text was updated successfully, but these errors were encountered: