-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
mrouted segfault after several hours #56
Comments
Hmm, yeah that's not right, it shouldn't segfault obviously. Regardless of your network setup. The failure to delete the APIPA route could be related, but there are quite a few hours in between that log entry and the segfault. My recommendation is to rebuild with GDB debug flags, start mrouted manually (or run
Start it ... wait for crash
The |
I've set up a long-term test in my home network to see if I can reproduce, but it would be real helpful if you could get the backtrace ( |
Hi again, unfortunately I've not been able to reproduce your crash in my (limited) setup at home. I'll let this issue remain open for other ppl to chime in as well. |
Hi. With the setup I have at the moment, this problem occurs a few times an hour. I've looked at multiple core dumps and the crash occurs in the same place, which is good. Please find the stack backtrace below:
Also the configuration file referenced on the command line:
mrouted version is 4.5. There are no errors/warnings generated by mrouted in syslog. |
Thank you @jjr-simiatec for the backtrace! That code path is entered when a group subscription times out, i.e., when a receiver stops sending IGMP messages to "join" the given group. I see now that the |
Ensure all group timers are stopped when stopping interfaces. Fixes #56 Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
@jjr-simiatec I've just pushed a branch |
Hi @troglobit. That was some fast work! I've tested the version on the
I have added the
Again for multiple runs, the stack traces when the failure occurs are the same. With regard to your query about a possible cause, the test is free-running without user interaction. There's nothing in the logs to indicate link state changes are occurring either. Hope this provides a little more insight into the problem! |
Great stuff, thank you so much! This was really helpful, and also confirms a nagging thought I had: the callback seems to be called twice -- which was also the only logical option left. While debugging this during the weekend, I also noticed the internal time was off by a factor two, so I started looking at the timer code again and ... well it's a legacy implementation that will have to go. Fortunately all of these multicast routers were built on top of each other around the same time, so I have another implementation that I know works. It'll take a few evenings to sort it out though, so can't give you a prognosis for when I can have something new for you to test. |
No problem, I'll be ready to test when you are. |
The existing timer implementation was too imprecise and jittered several seconds between query intervals. This patch fixes that by replacing it with the Public Domain library Portable Events (pev). Issue #56 Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
@jjr-simiatec took a bit longer than expected, but found a few other nasty things, as well as a potential speedup at startup. Hope the latest commits on the |
@troglobit looks like a lot of great work! I recreated the test set up with the original version of mrouted to verify that I could still trigger the issue and then dropped in your new version. It's rock-solid and has been for hours - can't argue with results! The only issue I could find is when mrouted is compiled with 64-bit time support on a 32-bit Linux platform by adding the options If there are any other aspects you would like me to test, please let me know. |
Glad to hear it works better now, finally! 🎉 🥳 Then I'll go ahead and clean up the commits, squash, and merge to
This sounds very interesting! Please go ahead and report that as a separate issue and I'll have a look. (Massive Buildroot user and maintainer here 👋 :)
Nothing that comes to mind at the moment. Thank you so much for helping out testing and nailing this one down! |
Ensure all group timers are stopped when stopping interfaces. Issue #56 Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Hello! Currently have a setup using mrouted 4.4 on 2 interfaces on a brand new installation of Debian 11 CLI.
Downloaded the 4.4 client from deb.troglobit.com.
No modifications to mrouted.conf, just launching the daemon and letting it do it's thing for the 2 interfaces.
Please spare my lack of knowledge as I am neither a Linux pro, or a developer. :(
My 2 interfaces are:
I can see that the daemon crashes and pulled the syslog at that moment:
Restarting the daemon will allow it to run for another few hours, but will crash again. I previously had mrouted running on an Ubuntu VM, and the same issue happened. I am definitely suspecting something strange on MY network affecting the daemon. Perhaps the strange APIPA route del requests are doing something to the daemon?
My linux knowledge is very minimal, but if I'm given the steps to do something I will try and get it done for you. I am willing to debug or do whatever is needed. My goal is simply to have a stable daemon that I do not need to restart every so often. Appreciate any help that can be given!
The text was updated successfully, but these errors were encountered: