Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

sem-speed-test: deadlocks sometimes when pthread_cancel is called on the threads that are actively using semaphores #1165

Open
Tracked by #1161
stanislaw opened this issue Sep 22, 2021 · 0 comments
Labels
bug unit-test Tickets related to the OSAL unit testing (functional and/or coverage)

Comments

@stanislaw
Copy link
Contributor

Is your feature request related to a problem? Please describe.

This report is similar to #1160 and #1164 because a resource, in this case a thread, gets destroyed by pthread_cancel while another thread wants to sem_wait or sem_post on semaphore that was just before sem_wait-d or sem_post-d by waiting on by the destroyed thread.

This is how the pthread_join called by OS_TaskDelete gets deadlocked on macOS:

(lldb) thread backtrace 
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff203b59ee libsystem_kernel.dylib`__ulock_wait + 10
    frame #1: 0x00007fff203eaf60 libsystem_pthread.dylib`_pthread_join + 362
    frame #2: 0x000000010614f389 sem-speed-test`OS_TaskDelete_Impl(token=0x00007ffee9ac08f0) at os-impl-tasks.c:694:15
    frame #3: 0x000000010614ad9d sem-speed-test`OS_TaskDelete(task_id=65537) at osapi-task.c:239:23
    frame #4: 0x000000010613fea1 sem-speed-test`SemRun at sem-speed-test.c:216:14
    frame #5: 0x0000000106142839 sem-speed-test`UtTest_Run at uttest.c:174:17
    frame #6: 0x0000000106141e29 sem-speed-test`OS_Application_Run at utbsp.c:232:5
    frame #7: 0x0000000106154fca sem-speed-test`main(argc=1, argv=0x00007ffee9ac0a08) at bsp_start.c:247:5
    frame #8: 0x00007fff20404f3d libdyld.dylib`start + 1
    frame #9: 0x00007fff20404f3d libdyld.dylib`start + 1

Is it undefined behavior when a thread gets pthread_cancelled while waiting or posting on a semaphore? This at least seems to be the case on macOS where the pthread_join deadlocks on a cancelled thread.

Describe the solution you'd like

With all due appreciation of the testing setup created in sem-speed-test, the thread loops of the task 1 and task 2 could be managed explicitly as to when their job should be finished so that the pthread_cancel does not catch both threads while they are still managing the semaphores.

Describe alternatives you've considered

For now, I have created a simple hack in the task 1 and task 2: their thread loops both depend on two global variables:

bool      task_1_done = false;
bool      task_2_done = false;

...
while (!task_1_done && task_1_work < SEMTEST_WORK_LIMIT) {
...
}

while (!task_2_done && task_2_work < SEMTEST_WORK_LIMIT) {
...
}

And then before actually deleting the tasks:

    /* Give the initial sem that starts the loop */
    SEMOP(Give)(sem_id_1);

    /* Time Limited Execution */
    OS_TaskDelay(5000);

    // Let the threads finish their job.
    task_1_done = true;
    task_2_done = true;
    OS_TaskDelay(1000);

    // TODO: Deleting task is sometimes OS_SUCCESS and sometimes OS_ERR_INVALID_ID
    status = OS_TaskDelete(task_1_id);
    // UtAssert_True(status == OS_ERR_INVALID_ID, "Task 1 delete Rc=%d", (int)status);

    status = OS_TaskDelete(task_2_id);
    // UtAssert_True(status == OS_ERR_INVALID_ID, "Task 2 delete Rc=%d", (int)status);

With this change, the pthread_cancel followed by pthread_join does not block on macOS.

Additional context

This behavior is 100% reproducible on macOS, branch of the #1161.

I have also applied the Clang's Thread Sanitizer to this and other tests. The thread sanitizer immediately complains about possible races related to unprotected access to the global variables managed by the tests. It could become a separate ticket when the more trivial issues reported so far are resolved.

Requester Info

Stanislav Pankevich (Personal contribution).

@stanislaw stanislaw changed the title sem-speed-test: deadlocks sometimes because pthread_cancel is called on one of the threads while the other is waiting on a semaphore sem-speed-test: deadlocks sometimes when pthread_cancel is called on the threads that are actively using semaphores Sep 22, 2021
@skliper skliper added bug unit-test Tickets related to the OSAL unit testing (functional and/or coverage) labels Sep 28, 2021
jphickey pushed a commit to jphickey/osal that referenced this issue Aug 10, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug unit-test Tickets related to the OSAL unit testing (functional and/or coverage)
Projects
None yet
Development

No branches or pull requests

2 participants