-
Notifications
You must be signed in to change notification settings - Fork 366
Fix segfault in JTC and simplify checks #423
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you extend tests to catch this error, i.e., provoke this error so we see this is actually doing what it should?
Are you changing interfaces between reactivation cycles?
joint_trajectory_controller/src/joint_trajectory_controller.cpp
Outdated
Show resolved
Hide resolved
@@ -374,12 +374,17 @@ void JointTrajectoryController::read_state_from_hardware(JointTrajectoryPoint & | |||
auto assign_point_from_interface = | |||
[&](std::vector<double> & trajectory_point_interface, const auto & joint_interface) | |||
{ | |||
trajectory_point_interface.resize(dof_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but this should already have the right size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, it's not (see the PR description for where I think the size is being changed) which is why it's segfaulting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@destogl managed to find the cause? We could open an investigation ticket but I'm keen on merging this soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this info is useful or not, but I encountered this issue when doing the following (using the latest master):
- load the JTC into inactivate state from the CLI using the --set-state configured option
- unload the JTC immediately
The result was a segfault in every single case. No reactivation needed, as the controller was not activated at all. On the other hand, if we load it into unconfigured state, and then attempt to unload it, the segfault doesn't occur.
Neither when we're loading the JTC into inactive state -> activating it -> deactivating it (JTC is back into inactive state) and only then unloading it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VX792 Did this PR resolve the issue for you? If yes, do you mind creating a test that trigger the segfault? I no longer have access to the codebase that reproduce this issue 😿
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, it didn't. It's entirely possible that we're talking about two different issues, since mine was 'solved' by a really quick fix.
The problem was that the traj_home_point_ptr_ variable is initially a nullptr, and it only gets initialized in the on_activate() function. If we're calling the on_cleanup() fn through a unload_controller request, until the activation doesn't happen, the cleanup will attempt to call a traj_home_point_ptr_->update(), trying to access a member of a nullptr. And that's a huge segfault!
We can check if the pointer is null and simply return (it works!), but I know nothing about control theory so @destogl might come up with a better solution.
No, we don't, it's the same controller |
I am not convinced that this actually solves the cause of the issue. I would say it's more about hiding the real issue. I would wait to reproduce this in the tests and then know what is causing this... |
This pull request is in conflict. Could you fix it @JafarAbdi? |
978ba10
to
d7d1fd9
Compare
Codecov Report
@@ Coverage Diff @@
## master #423 +/- ##
==========================================
- Coverage 35.78% 32.48% -3.31%
==========================================
Files 189 7 -182
Lines 17570 665 -16905
Branches 11592 357 -11235
==========================================
- Hits 6287 216 -6071
+ Misses 994 157 -837
+ Partials 10289 292 -9997
Flags with carried forward coverage won't be shown. Click here to find out more.
|
547ff4e
to
d7d1fd9
Compare
d7d1fd9
to
8b708cb
Compare
joint_trajectory_controller/test/test_trajectory_controller.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for following up w tests
I'm sorry, I forgot to mention that the test I added has an issue. I had a call with Denis, and he asked me to push my WIP test, so he could take a look and see if he can help finish it! |
I tried to merge this with the old reactivation test, where actual trajectory execution is done. The complexity there is that we change the controller to use input time and period in the update method, so their parameters should be adjusted. First the test need to replicate the described issue and after that show that fix is working. The |
@swiz23 noticed a segfault in the JTC specifically when aborting a motion, followed by deactivating, followed by activating where the segfault would occur. After looking at this PR, I think it very much has to do with the line
The solution is to simply move the line defining the output_state within the sample() function in trajectory.cpp to after the last possible place that the function can return false -> here. I tested this out at least for the case that @swiz23 had issues with and verified that there was no longer a segfault. |
Thanks @mechwiz for the fix. I'm closing this since someone else who currently can reproduce the issue have a fix |
This PR fixes a segfault in JTC, we were getting a segfault every time we deactivate and then re-activate the controllers.
Attaching a debugger to the node shows read_state_from_hardware(state_desired_) causing the segfault, for some reason the variable in empty, digging more I don't see this variable being modified much, but I suspect it's related to the call to sample which set it to the default value output_state = trajectory_msgs::msg::JointTrajectoryPoint()