Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fixed blocking calls to ExecutionEngine::process() #3

Conversation

meyerj
Copy link
Collaborator

@meyerj meyerj commented Apr 13, 2015

Broadcasting condition variables without holding the associated mutex can block for some reasons.
This patch adds MutexLocks whereever the msg_cond is triggered.

From http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_cond_broadcast.html:

The pthread_cond_broadcast() or pthread_cond_signal() functions may be called by a thread whether
or not it currently owns the mutex that threads calling pthread_cond_wait() or pthread_cond_timedwait()
have associated with the condition variable during their waits; however, if predictable scheduling
behavior is required, then that mutex shall be locked by the thread calling pthread_cond_broadcast() or
pthread_cond_signal().

This patch also clears the Thread active flag in Thread::terminate() is called and forces the thread
function to be terminated in the destructor of class Activity. This solves some issues with pure virtual
method calls during the destruction phase if threads are still running and triggering each other.

Broadcasting condition variables without holding the associated mutex can block for some reasons.
This patch adds MutexLocks whereever the msg_cond is triggered.

From http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_cond_broadcast.html:
> The pthread_cond_broadcast() or pthread_cond_signal() functions may be called by a thread whether
> or not it currently owns the mutex that threads calling pthread_cond_wait() or pthread_cond_timedwait()
> have associated with the condition variable during their waits; however, if predictable scheduling
> behavior is required, then that mutex shall be locked by the thread calling pthread_cond_broadcast() or
> pthread_cond_signal().

This patch also clears the Thread active flag in Thread::terminate() is called and forces the thread
function to be terminated in the destructor of class Activity. This solves some issues with pure virtual
method calls during the destruction phase if threads are still running and triggering each other.

Signed-off-by: Johannes Meyer <johannes@intermodalics.eu>
psoetens pushed a commit that referenced this pull request Apr 13, 2015
Fixed blocking calls to ExecutionEngine::process()
@psoetens psoetens merged commit 8d53590 into psoetens:master-update-hook-vs-callback-queue Apr 13, 2015
@meyerj
Copy link
Collaborator Author

meyerj commented Feb 15, 2017

@smits @psoetens @snrkiwi

I am not sure anymore if this patch which is part of orocos-toolchain#91 merged into toolchain-2.9 (and rdt-next) was actually a good idea. Locking the mutex in Activity::trigger() or ExecutionEngine::process() can cause priority inversion issues. I do not remember anymore what was the exact issue that triggered this patch. But from the man page of pthread_cond_broadcast I do not see a reason why the original code should have blocked the thread signalling the condition variable.

Simplified and condensed pseudo code:

// Activity 1 ("the caller") calls one of our operations...
bool ExecutionEngine::process( DisposableInterface* c ) {
  bool result = mqueue->enqueue( c );
  this->getActivity()->trigger();

  // no os::MutexLock lock(msg_lock) without this patch!!!
  msg_cond.broadcast(); // required for waitAndProcessMessages() (EE thread)
}

// ...which triggers the runner activity 2 ("the runner")...
bool Activity::trigger() {
  // no os::MutexLock lock(msg_lock) without this patch!!!
  msg_cond.broadcast(); // not the same msg_cond as in ExecutionEngine
}

// ...and waits for the completion of the operation.
bool ExecutionEngine::waitForMessagesInternal(boost::function<bool(void)> const& pred)
{
  // only to be called from the thread not executing step().
  os::MutexLock lock(msg_lock);
  while (!pred()) { // the mutex guards that processMessages can not run between !pred and the wait().
    msg_cond.wait(msg_lock); // now processMessages may run.
  }
}

// Meanwhile, the activity 2 ("the runner") was waiting in...
void Activity::loop() {
  // non-periodic case only
  while(not stopping) {
    runner->work(...);

    os::MutexLock lock(msg_lock);
    msg_cond.wait(msg_lock);
  }
}

// ...and processes the queue:
void ExecutionEngine::work(...)
  // ...
  processMessages();
}
void ExecutionEngine::processMessages() {
  DisposableInterface* com(0);
  {
    while ( mqueue->dequeue(com) ) {
      com->executeAndDispose();
    }
    // there's no need to hold the lock during
    // emptying the queue. But we must hold the
    // lock once between excuteAndDispose and the
    // broadcast to avoid the race condition in
    // waitForMessages().
    // This allows us to recurse into processMessages.
    MutexLock locker( msg_lock );
  }
  if ( com )
    msg_cond.broadcast(); // required for waitForMessages() (3rd party thread)
}

The interesting part is the comment in processMessages(). Maybe something similar was the case in processFunctions() and this is the only lock that was really required?

I think there is still a race condition in the current Activity implementation, with or without the mutex, because if trigger() was called between runner->work(...) and msg_cond.wait() in Activity::loop() the trigger is lost.

The second part of the patch (terminate the thread in the Activity destructor) is certainly valid. It should have been a separate commit.

Main PR: orocos-toolchain#91

meyerj referenced this pull request in orocos-toolchain/rtt Oct 16, 2017
Mutex wasn't alwas locked, so rtos_wait_cond would wait forever
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants