Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Implement heap based timer list #439

Merged
merged 4 commits into from
Mar 18, 2021
Merged

Conversation

jfriesse
Copy link
Member

Previous timer was sorted list implementation of priority queue and very slow when number of timers increased. This is mostly not a problem because usually only few timers are used. But for application where bigger number of timers are needed it may become problem.

Solution is to use binary heap based priority queue which is much faster.

API is unchanged, just timerlist_destroy is added which should be called to free heap array. This function also destroys mutex (omitted when mutex was added).

It's worth noting that qb loop timer is using qb array which is (by default right now) limited to 65 536 entries and it is doing its own mapping of tlist entries and searching by simple for cycle (so lineary) so speed improvement here is really not big for real app.

I've used check_tlist.cto measure improvement:

  • SPEED_TEST_NO_ITEMS set to 40 000
    Previous implementation:
Running suite(s): tlist
100%: Checks: 2, Failures: 0, Errors: 0
check_tlist.c:107:P:check_basic:test_check_basic:0: Passed
check_tlist.c:156:P:check_speed:test_check_speed:0: Passed

real	0m15.323s
user	0m14.692s
sys	0m0.270s

New implementation:

Running suite(s): tlist
100%: Checks: 2, Failures: 0, Errors: 0
check_tlist.c:113:P:check_basic:test_check_basic:0: Passed
check_tlist.c:164:P:check_speed:test_check_speed:0: Passed

real	0m0.761s
user	0m0.229s
sys	0m0.233s

For default (10000) speedup is not really that impressive:
Original:

real	0m1.253s
user	0m0.857s
sys	0m0.095s

Patched:

real	0m0.459s
user	0m0.079s
sys	0m0.085s

No matter what, I believe the extra code is worth - if not, then test suite for sure is ;)

@jfriesse jfriesse requested a review from chrissie-c March 10, 2021 16:25
@jnpkrn
Copy link
Contributor

jnpkrn commented Mar 10, 2021

Is it now better to just deactivate the timer and recycle it later on,
since it's the act of removal that possibly becomes costlier, while
skipping over dead timers won't have a noticeable performance impact
now?

Library users would definitely benefit from some best practices of
how to deal with loop timers rationally when such insights from the
implementers/maintainers are shared.

@jnpkrn
Copy link
Contributor

jnpkrn commented Mar 10, 2021

It's unrelated to this change but little value in commentng on
a multiple disparate closed PRs, so excuse my rhetorical call
here in case it can inform future directions:

Any reasons why there is no consistent use of qb_thread_* functions
vs. mixing native pthread_* ones in?

And were atomic variable operations considered with those recent
locking additions as an alternative?

@jfriesse jfriesse force-pushed the timer-heap branch 2 times, most recently from db6fe0b to 5e3e5ae Compare March 11, 2021 13:54
@jfriesse
Copy link
Member Author

Is it now better to just deactivate the timer and recycle it later on,
since it's the act of removal that possibly becomes costlier, while
skipping over dead timers won't have a noticeable performance impact
now?

There is no change in this respect. "Just deactivate" timer is always faster, but that's not point of this PR. I have a version with freelist handy but I rather go step by step.

Library users would definitely benefit from some best practices of
how to deal with loop timers rationally when such insights from the
implementers/maintainers are shared.

@jfriesse
Copy link
Member Author

Added more locking so hopefully it is now lock safe.

@chrissie-c
Copy link
Contributor

helgrind found a few things (it took several runs though!)

==23465== Helgrind, a thread error detector
==23465== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==23465== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==23465== Command: .libs/loop.test
==23465== Parent PID: 1991
==23502== ---Thread-Announcement------------------------------------------
==23502==
==23502== Thread #2 was created
==23502== at 0x4AD1B42: clone (in /usr/lib64/libc-2.32.so)
==23502== by 0x49B42E1: create_thread (in /usr/lib64/libpthread-2.32.so)
==23502== by 0x49B5C20: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.32.so)
==23502== by 0x484153F: pthread_create_WRK (hg_intercepts.c:436)
==23502== by 0x484263D: pthread_create@* (hg_intercepts.c:469)
==23502== by 0x404198: test_loop_timer_threads_fn (check_loop.c:469)
==23502== by 0x49A5E88: srunner_run_tagged (in /usr/lib64/libcheck.so.0.15.2)
==23502== by 0x402749: main (check_loop.c:829)
==23502==
==23502== ---Thread-Announcement------------------------------------------
==23502==
==23502== Thread #1 is the program's root thread
==23502==
==23502== ----------------------------------------------------------------
==23502==
==23502== Possible data race during read of size 8 at 0x409200 by thread #2
==23502== Locks held: none
==23502== at 0x404878: loop_timer_thread (check_loop.c:451)
==23502== by 0x4841737: mythread_wrapper (hg_intercepts.c:398)
==23502== by 0x49B53F8: start_thread (in /usr/lib64/libpthread-2.32.so)
==23502== by 0x4AD1B52: clone (in /usr/lib64/libc-2.32.so)
==23502==
==23502== This conflicts with a previous write of size 8 by thread #1
==23502== Locks held: none
==23502== at 0x485AE8E: qb_loop_timer_add (loop_timerlist.c:232)
==23502== by 0x404229: test_loop_timer_threads_fn (check_loop.c:477)
==23502== by 0x49A5E88: srunner_run_tagged (in /usr/lib64/libcheck.so.0.15.2)
==23502== by 0x402749: main (check_loop.c:829)
==23502== Address 0x409200 is 0 bytes inside data symbol "test_th"
==23502==
==23502== ----------------------------------------------------------------
==23502==
==23502== Possible data race during read of size 4 at 0x4F72E34 by thread #2
==23502== Locks held: none
==23502== at 0x485B2C6: timer_from_handle (loop_timerlist.c:147)
==23502== by 0x485B2C6: qb_loop_timer_expire_time_get (loop_timerlist.c:291)
==23502== by 0x485B3B8: qb_loop_timer_is_running (loop_timerlist.c:356)
==23502== by 0x404886: loop_timer_thread (check_loop.c:451)
==23502== by 0x4841737: mythread_wrapper (hg_intercepts.c:398)
==23502== by 0x49B53F8: start_thread (in /usr/lib64/libpthread-2.32.so)
==23502== by 0x4AD1B52: clone (in /usr/lib64/libc-2.32.so)
==23502==
==23502== This conflicts with a previous write of size 4 by thread #1
==23502== Locks held: none
==23502== at 0x485AE71: qb_loop_timer_add (loop_timerlist.c:224)
==23502== by 0x404229: test_loop_timer_threads_fn (check_loop.c:477)
==23502== by 0x49A5E88: srunner_run_tagged (in /usr/lib64/libcheck.so.0.15.2)
==23502== by 0x402749: main (check_loop.c:829)
==23502== Address 0x4f72e34 is 308 bytes inside a block of size 1,280 alloc'd
==23502== at 0x483DB89: calloc (vg_replace_malloc.c:760)
==23502== by 0x4858C5B: qb_array_index (array.c:153)
==23502== by 0x485AE0C: qb_loop_timer_add (loop_timerlist.c:205)
==23502== by 0x404861: loop_timer_thread (check_loop.c:448)
==23502== by 0x4841737: mythread_wrapper (hg_intercepts.c:398)
==23502== by 0x49B53F8: start_thread (in /usr/lib64/libpthread-2.32.so)
==23502== by 0x4AD1B52: clone (in /usr/lib64/libc-2.32.so)
==23502== Block was alloc'd by thread #2
==23502==
==23502== ----------------------------------------------------------------
==23502==
==23502== Lock at 0x4F72298 was first observed
==23502== at 0x484299B: pthread_mutex_init (hg_intercepts.c:796)
==23502== by 0x485ACB8: qb_loop_timer_create (loop_timerlist.c:110)
==23502== by 0x4858DF5: qb_loop_create (loop.c:103)
==23502== by 0x404169: test_loop_timer_threads_fn (check_loop.c:466)
==23502== by 0x49A5E88: srunner_run_tagged (in /usr/lib64/libcheck.so.0.15.2)
==23502== by 0x402749: main (check_loop.c:829)
==23502== Address 0x4f72298 is 104 bytes inside a block of size 144 alloc'd
==23502== at 0x483B8A9: malloc (vg_replace_malloc.c:307)
==23502== by 0x485AC33: qb_loop_timer_create (loop_timerlist.c:99)
==23502== by 0x4858DF5: qb_loop_create (loop.c:103)
==23502== by 0x404169: test_loop_timer_threads_fn (check_loop.c:466)
==23502== by 0x49A5E88: srunner_run_tagged (in /usr/lib64/libcheck.so.0.15.2)
==23502== by 0x402749: main (check_loop.c:829)
==23502== Block was alloc'd by thread #1
==23502==
==23502== Possible data race during read of size 4 at 0x4F72E30 by thread #2
==23502== Locks held: none
==23502== at 0x485B2CD: qb_loop_timer_expire_time_get (loop_timerlist.c:296)
==23502== by 0x485B3B8: qb_loop_timer_is_running (loop_timerlist.c:356)
==23502== by 0x404886: loop_timer_thread (check_loop.c:451)
==23502== by 0x4841737: mythread_wrapper (hg_intercepts.c:398)
==23502== by 0x49B53F8: start_thread (in /usr/lib64/libpthread-2.32.so)
==23502== by 0x4AD1B52: clone (in /usr/lib64/libc-2.32.so)
==23502==
==23502== This conflicts with a previous write of size 4 by thread #1
==23502== Locks held: 1, at address 0x4F72298
==23502== at 0x485AE3B: qb_loop_timer_add (loop_timerlist.c:206)
==23502== by 0x404229: test_loop_timer_threads_fn (check_loop.c:477)
==23502== by 0x49A5E88: srunner_run_tagged (in /usr/lib64/libcheck.so.0.15.2)
==23502== by 0x402749: main (check_loop.c:829)
==23502== Address 0x4f72e30 is 304 bytes inside a block of size 1,280 alloc'd
==23502== at 0x483DB89: calloc (vg_replace_malloc.c:760)
==23502== by 0x4858C5B: qb_array_index (array.c:153)
==23502== by 0x485AE0C: qb_loop_timer_add (loop_timerlist.c:205)
==23502== by 0x404861: loop_timer_thread (check_loop.c:448)
==23502== by 0x4841737: mythread_wrapper (hg_intercepts.c:398)
==23502== by 0x49B53F8: start_thread (in /usr/lib64/libpthread-2.32.so)
==23502== by 0x4AD1B52: clone (in /usr/lib64/libc-2.32.so)
==23502== Block was alloc'd by thread #2
==23502==
==23502== ----------------------------------------------------------------
==23502==
==23502== Possible data race during read of size 8 at 0x4F72E28 by thread #2
==23502== Locks held: none
==23502== at 0x485B2E0: timerlist_expire_time (tlist.h:375)
==23502== by 0x485B2E0: qb_loop_timer_expire_time_get (loop_timerlist.c:300)
==23502== by 0x485B3B8: qb_loop_timer_is_running (loop_timerlist.c:356)
==23502== by 0x404886: loop_timer_thread (check_loop.c:451)
==23502== by 0x4841737: mythread_wrapper (hg_intercepts.c:398)
==23502== by 0x49B53F8: start_thread (in /usr/lib64/libpthread-2.32.so)
==23502== by 0x4AD1B52: clone (in /usr/lib64/libc-2.32.so)
==23502==
==23502== This conflicts with a previous write of size 8 by thread #1
==23502== Locks held: none
==23502== at 0x485AF45: timerlist_add_duration (tlist.h:346)
==23502== by 0x485AF45: qb_loop_timer_add (loop_timerlist.c:234)
==23502== by 0x404229: test_loop_timer_threads_fn (check_loop.c:477)
==23502== by 0x49A5E88: srunner_run_tagged (in /usr/lib64/libcheck.so.0.15.2)
==23502== by 0x402749: main (check_loop.c:829)
==23502== Address 0x4f72e28 is 296 bytes inside a block of size 1,280 alloc'd
==23502== at 0x483DB89: calloc (vg_replace_malloc.c:760)
==23502== by 0x4858C5B: qb_array_index (array.c:153)
==23502== by 0x485AE0C: qb_loop_timer_add (loop_timerlist.c:205)
==23502== by 0x404861: loop_timer_thread (check_loop.c:448)
==23502== by 0x4841737: mythread_wrapper (hg_intercepts.c:398)
==23502== by 0x49B53F8: start_thread (in /usr/lib64/libpthread-2.32.so)
==23502== by 0x4AD1B52: clone (in /usr/lib64/libc-2.32.so)
==23502== Block was alloc'd by thread #2
==23502==
==23502== ----------------------------------------------------------------
==23502==
==23502== Possible data race during read of size 8 at 0x4F75310 by thread #2
==23502== Locks held: none
==23502== at 0x485B2E4: timerlist_expire_time (tlist.h:375)
==23502== by 0x485B2E4: qb_loop_timer_expire_time_get (loop_timerlist.c:300)
==23502== by 0x485B3B8: qb_loop_timer_is_running (loop_timerlist.c:356)
==23502== by 0x404886: loop_timer_thread (check_loop.c:451)
==23502== by 0x4841737: mythread_wrapper (hg_intercepts.c:398)
==23502== by 0x49B53F8: start_thread (in /usr/lib64/libpthread-2.32.so)
==23502== by 0x4AD1B52: clone (in /usr/lib64/libc-2.32.so)
==23502== Address 0x4f75310 is 0 bytes inside a block of size 48 alloc'd
==23502== at 0x483B8A9: malloc (vg_replace_malloc.c:307)
==23502== by 0x485AE9E: timerlist_add_duration (tlist.h:329)
==23502== by 0x485AE9E: qb_loop_timer_add (loop_timerlist.c:234)
==23502== by 0x404229: test_loop_timer_threads_fn (check_loop.c:477)
==23502== by 0x49A5E88: srunner_run_tagged (in /usr/lib64/libcheck.so.0.15.2)
==23502== by 0x402749: main (check_loop.c:829)
==23502== Block was alloc'd by thread #1
==23502==

@jfriesse
Copy link
Member Author

Helgrind error seemed to be bug in check_loop, so I've pushed fix for check_loop.

Copy link
Contributor

@chrissie-c chrissie-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to e. The only thing I can see that needs changing is that the comment on 170, 'childs' should be 'children' :)

Speed test is adding only 10000 items so it is reasonable
fast even with sorted linked list implementation.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Previous timer was sorted list implementation of priority queue
and very slow when number of timers increased. This is mostly
not a problem because usually only few timers are used.
But for application where bigger number of timers are needed
it may become problem.

Solution is to use binary heap based priority queue which is much
faster.

API is unchanged, just timerlist_destroy is added which should be called
to free heap array. This function also destroys mutex (omitted when
mutex was added).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
test_th was accesed both by main thread and loop_timer thread resulting in
failure. Fix is to access test_tht in loop_timer thread.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
@chrissie-c chrissie-c merged commit 48fff5e into ClusterLabs:master Mar 18, 2021
@chrissie-c
Copy link
Contributor

An excellent improvement, thanks!

bmwiedemann pushed a commit to bmwiedemann/openSUSE that referenced this pull request Nov 18, 2021
https://build.opensuse.org/request/show/931807
by user yan_gao + dimstar_suse
- Update to version 2.0.4+20211112.a2691b9 (v2.0.4):
- poll: Don't log in a signal handler (gh#ClusterLabs/libqb##447)
- Fix pthread returns (gh#ClusterLabs/libqb#444)
- doxygen2man: print structure descriptions (gh#ClusterLabs/libqb#443)
- Implement heap based timer list (gh#ClusterLabs/libqb#439) (forwarded request 931806 from yan_gao)
kraj pushed a commit to YoeDistro/meta-openembedded that referenced this pull request Nov 22, 2021
The most important fix in this release is that we no longer log errors inside
the signal handler in loop_poll.c
This could cause an application hang in some circumstances.

Changelog is as follows:
doxygen2man: print structure descriptions
(ClusterLabs/libqb#443)

Fix pthread returns
(ClusterLabs/libqb#444)

poll: Don't log in a signal handler
(ClusterLabs/libqb#447)

Bump library version for v2.0.4

Implement heap based timer list
(ClusterLabs/libqb#439)

build: Fix undefined pthread reference.
(ClusterLabs/libqb#440)

Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
daregit pushed a commit to daregit/yocto-combined that referenced this pull request May 22, 2024
The most important fix in this release is that we no longer log errors inside
the signal handler in loop_poll.c
This could cause an application hang in some circumstances.

Changelog is as follows:
doxygen2man: print structure descriptions
(ClusterLabs/libqb#443)

Fix pthread returns
(ClusterLabs/libqb#444)

poll: Don't log in a signal handler
(ClusterLabs/libqb#447)

Bump library version for v2.0.4

Implement heap based timer list
(ClusterLabs/libqb#439)

build: Fix undefined pthread reference.
(ClusterLabs/libqb#440)

Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
daregit pushed a commit to daregit/yocto-combined that referenced this pull request May 22, 2024
The most important fix in this release is that we no longer log errors inside
the signal handler in loop_poll.c
This could cause an application hang in some circumstances.

Changelog is as follows:
doxygen2man: print structure descriptions
(ClusterLabs/libqb#443)

Fix pthread returns
(ClusterLabs/libqb#444)

poll: Don't log in a signal handler
(ClusterLabs/libqb#447)

Bump library version for v2.0.4

Implement heap based timer list
(ClusterLabs/libqb#439)

build: Fix undefined pthread reference.
(ClusterLabs/libqb#440)

Signed-off-by: Wang Mingyu <wangmy@fujitsu.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants