Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

log: Fix threading races #396

Merged
merged 1 commit into from
Jun 1, 2020
Merged

log: Fix threading races #396

merged 1 commit into from
Jun 1, 2020

Conversation

chrissie-c
Copy link
Contributor

It seems we can't avoiding locking in qb_log_dcs_get because
qb_array_get() isn't thread-safe.

Alternatives to this would be to add a - new - implementation of
the arrays or to stick the lock higher up (as in this patch).
We're logging here, so doing I/O, plus all the formatting
that goes on, I think the locking is a minimal intrusion on the CPU
relatively speaking.

This bug was seen in the test suite on BSD so it's not just theoretical

also wthread_should_exit was unprotected.

@chrissie-c
Copy link
Contributor Author

Note on the reproducibility of this. Currently I can only reproduce it on FreeBSD-devel by running the log.test repeatedly. Roughly 1 in 100 (but it can take longer) will fail in test_log_threads. Two threads both in qb_array_index() but at the different places in qb_log_dcs_get().

It's possible that cs->filename or cs->format could be read
in the 'fast' path while the 'slow' path is still constructing
the object. So we need to lock arr_next_lock before copying them
out for the caller.

Also wthread_should_exit was unprotected.
@chrissie-c
Copy link
Contributor Author

Update (the previous update was from last week but I posted it in the wrong PR).

The bug seems to be the possibility of using a 'cs' object that was not fully constructed. I've moved the lock line down one because the problem isn't actually in qb_array_index() but below that where we copy the values into safe_filename and safe_format. cs might be still being constructed in _log_dcs_new_cs() (which is protected by the lock).

Copy link
Member

@jfriesse jfriesse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, not even sure how it is possible that code was working without locking.

@chrissie-c
Copy link
Contributor Author

Just good luck I suspect!

Thanks for the review.

@chrissie-c chrissie-c merged commit 4964193 into ClusterLabs:master Jun 1, 2020
@chrissie-c chrissie-c deleted the logthread-fix branch June 1, 2020 14:42
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants