You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
This report is similar to #1160 and #1164 because a resource, in this case a thread, gets destroyed by pthread_cancel while another thread wants to sem_wait or sem_post on semaphore that was just before sem_wait-d or sem_post-d by waiting on by the destroyed thread.
This is how the pthread_join called by OS_TaskDelete gets deadlocked on macOS:
Is it undefined behavior when a thread gets pthread_cancelled while waiting or posting on a semaphore? This at least seems to be the case on macOS where the pthread_join deadlocks on a cancelled thread.
Describe the solution you'd like
With all due appreciation of the testing setup created in sem-speed-test, the thread loops of the task 1 and task 2 could be managed explicitly as to when their job should be finished so that the pthread_cancel does not catch both threads while they are still managing the semaphores.
Describe alternatives you've considered
For now, I have created a simple hack in the task 1 and task 2: their thread loops both depend on two global variables:
booltask_1_done= false;
booltask_2_done= false;
...
while (!task_1_done&&task_1_work<SEMTEST_WORK_LIMIT) {
...
}
while (!task_2_done&&task_2_work<SEMTEST_WORK_LIMIT) {
...
}
And then before actually deleting the tasks:
/* Give the initial sem that starts the loop */SEMOP(Give)(sem_id_1);
/* Time Limited Execution */OS_TaskDelay(5000);
// Let the threads finish their job.task_1_done= true;
task_2_done= true;
OS_TaskDelay(1000);
// TODO: Deleting task is sometimes OS_SUCCESS and sometimes OS_ERR_INVALID_IDstatus=OS_TaskDelete(task_1_id);
// UtAssert_True(status == OS_ERR_INVALID_ID, "Task 1 delete Rc=%d", (int)status);status=OS_TaskDelete(task_2_id);
// UtAssert_True(status == OS_ERR_INVALID_ID, "Task 2 delete Rc=%d", (int)status);
With this change, the pthread_cancel followed by pthread_join does not block on macOS.
Additional context
This behavior is 100% reproducible on macOS, branch of the #1161.
I have also applied the Clang's Thread Sanitizer to this and other tests. The thread sanitizer immediately complains about possible races related to unprotected access to the global variables managed by the tests. It could become a separate ticket when the more trivial issues reported so far are resolved.
Requester Info
Stanislav Pankevich (Personal contribution).
The text was updated successfully, but these errors were encountered:
stanislaw
changed the title
sem-speed-test: deadlocks sometimes because pthread_cancel is called on one of the threads while the other is waiting on a semaphore
sem-speed-test: deadlocks sometimes when pthread_cancel is called on the threads that are actively using semaphores
Sep 22, 2021
Is your feature request related to a problem? Please describe.
This report is similar to #1160 and #1164 because a resource, in this case a thread, gets destroyed by
pthread_cancel
while another thread wants tosem_wait
orsem_post
on semaphore that was just beforesem_wait
-d orsem_post
-d by waiting on by the destroyed thread.This is how the
pthread_join
called byOS_TaskDelete
gets deadlocked on macOS:Is it undefined behavior when a thread gets
pthread_cancelled
while waiting or posting on a semaphore? This at least seems to be the case on macOS where thepthread_join
deadlocks on a cancelled thread.Describe the solution you'd like
With all due appreciation of the testing setup created in
sem-speed-test
, the thread loops of the task 1 and task 2 could be managed explicitly as to when their job should be finished so that thepthread_cancel
does not catch both threads while they are still managing the semaphores.Describe alternatives you've considered
For now, I have created a simple hack in the task 1 and task 2: their thread loops both depend on two global variables:
And then before actually deleting the tasks:
With this change, the
pthread_cancel
followed bypthread_join
does not block on macOS.Additional context
This behavior is 100% reproducible on macOS, branch of the #1161.
I have also applied the Clang's Thread Sanitizer to this and other tests. The thread sanitizer immediately complains about possible races related to unprotected access to the global variables managed by the tests. It could become a separate ticket when the more trivial issues reported so far are resolved.
Requester Info
Stanislav Pankevich (Personal contribution).
The text was updated successfully, but these errors were encountered: