Fix segfault in JTC and simplify checks #423

JafarAbdi · 2022-08-23T21:52:02Z

This PR fixes a segfault in JTC, we were getting a segfault every time we deactivate and then re-activate the controllers.

Attaching a debugger to the node shows read_state_from_hardware(state_desired_) causing the segfault, for some reason the variable in empty, digging more I don't see this variable being modified much, but I suspect it's related to the call to sample which set it to the default value output_state = trajectory_msgs::msg::JointTrajectoryPoint()

destogl

Can you extend tests to catch this error, i.e., provoke this error so we see this is actually doing what it should?

Are you changing interfaces between reactivation cycles?

joint_trajectory_controller/src/joint_trajectory_controller.cpp

destogl · 2022-08-24T12:19:57Z

joint_trajectory_controller/src/joint_trajectory_controller.cpp

@@ -374,12 +374,17 @@ void JointTrajectoryController::read_state_from_hardware(JointTrajectoryPoint &
  auto assign_point_from_interface =
    [&](std::vector<double> & trajectory_point_interface, const auto & joint_interface)
  {
+    trajectory_point_interface.resize(dof_);


This is fine, but this should already have the right size.

For some reason, it's not (see the PR description for where I think the size is being changed) which is why it's segfaulting

@destogl managed to find the cause? We could open an investigation ticket but I'm keen on merging this soon

Not sure if this info is useful or not, but I encountered this issue when doing the following (using the latest master):

load the JTC into inactivate state from the CLI using the --set-state configured option

unload the JTC immediately

The result was a segfault in every single case. No reactivation needed, as the controller was not activated at all. On the other hand, if we load it into unconfigured state, and then attempt to unload it, the segfault doesn't occur.
Neither when we're loading the JTC into inactive state -> activating it -> deactivating it (JTC is back into inactive state) and only then unloading it.

@VX792 Did this PR resolve the issue for you? If yes, do you mind creating a test that trigger the segfault? I no longer have access to the codebase that reproduce this issue 😿

Nope, it didn't. It's entirely possible that we're talking about two different issues, since mine was 'solved' by a really quick fix.

The problem was that the traj_home_point_ptr_ variable is initially a nullptr, and it only gets initialized in the on_activate() function. If we're calling the on_cleanup() fn through a unload_controller request, until the activation doesn't happen, the cleanup will attempt to call a traj_home_point_ptr_->update(), trying to access a member of a nullptr. And that's a huge segfault!

We can check if the pointer is null and simply return (it works!), but I know nothing about control theory so @destogl might come up with a better solution.

JafarAbdi · 2022-08-24T14:22:54Z

Are you changing interfaces between reactivation cycles?

No, we don't, it's the same controller

destogl · 2022-08-26T06:50:54Z

I am not convinced that this actually solves the cause of the issue. I would say it's more about hiding the real issue. I would wait to reproduce this in the tests and then know what is causing this...

mergify · 2022-09-07T08:27:49Z

This pull request is in conflict. Could you fix it @JafarAbdi?

codecov-commenter · 2022-09-07T11:40:13Z

Codecov Report

Merging #423 (154bbd1) into master (e7f9962) will decrease coverage by 3.31%.
The diff coverage is 26.60%.

@@            Coverage Diff             @@
##           master     #423      +/-   ##
==========================================
- Coverage   35.78%   32.48%   -3.31%     
==========================================
  Files         189        7     -182     
  Lines       17570      665   -16905     
  Branches    11592      357   -11235     
==========================================
- Hits         6287      216    -6071     
+ Misses        994      157     -837     
+ Partials    10289      292    -9997

Flag	Coverage Δ
unittests	`32.48% <26.60%> (-3.31%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...ontroller/test/test_load_diff_drive_controller.cpp	`12.50% <0.00%> (ø)`
diff_drive_controller/src/odometry.cpp	`42.16% <11.11%> (ø)`
diff_drive_controller/src/speed_limiter.cpp	`46.55% <11.11%> (ø)`
...ive_controller/test/test_diff_drive_controller.cpp	`17.62% <12.08%> (ø)`
...troller/include/diff_drive_controller/odometry.hpp	`20.00% <20.00%> (ø)`
...iff_drive_controller/src/diff_drive_controller.cpp	`39.22% <35.50%> (ø)`
...de/diff_drive_controller/diff_drive_controller.hpp	`100.00% <100.00%> (ø)`
...ontrollers/src/joint_group_velocity_controller.cpp
...test/test_load_joint_group_velocity_controller.cpp
...include/joint_trajectory_controller/tolerances.hpp
... and 191 more

joint_trajectory_controller/test/test_trajectory_controller.cpp

bmagyar

Thanks for following up w tests

JafarAbdi · 2022-12-16T15:24:35Z

I'm sorry, I forgot to mention that the test I added has an issue. I had a call with Denis, and he asked me to push my WIP test, so he could take a look and see if he can help finish it!

…istic with trajectory execution.

destogl · 2023-01-24T22:07:21Z

I tried to merge this with the old reactivation test, where actual trajectory execution is done. The complexity there is that we change the controller to use input time and period in the update method, so their parameters should be adjusted.

First the test need to replicate the described issue and after that show that fix is working. The position, velocity and acceleration filed don't need to have the same size, i.e., they may be uninitialized if not used. But they should definitely be allocated in the configure method. The question here is when after reactivation we loose the initialization in the memory of those filed. That's confusing for me – this is the reason I think we have much significant issue then this fix proposes.

mechwiz · 2023-02-06T23:02:33Z

@swiz23 noticed a segfault in the JTC specifically when aborting a motion, followed by deactivating, followed by activating where the segfault would occur. After looking at this PR, I think it very much has to do with the line
output_state = trajectory_msgs::msg::JointTrajectoryPoint().
I think the flow of events occur as follows:

When aborting, set_hold_position() is called. That essentially adds and writes an empty trajectory message in order to disregard the current one.
When sample() is called in the update() loop, state_desired_ gets overwritten with that empty trajectory point meaning it has an empty position, velocity, and acceleration array.
The controllers get deactivated
The controllers get reactivated. When this happens, read_state_frome_hardware is called on state_desired_. Within that function, we try to index the position array within state_deisred_ as if it has the same size as dof_ when it in fact no longer does. This causes the segault.

The solution is to simply move the line defining the output_state within the sample() function in trajectory.cpp to after the last possible place that the function can return false -> here. I tested this out at least for the case that @swiz23 had issues with and verified that there was no longer a segfault.

JafarAbdi · 2023-02-07T07:50:27Z

Thanks @mechwiz for the fix. I'm closing this since someone else who currently can reproduce the issue have a fix

github-actions bot requested review from bmagyar, destogl, erickisos, harderthan, livanov93, progtologist and rosterloh August 23, 2022 21:52

destogl requested changes Aug 24, 2022

View reviewed changes

JafarAbdi force-pushed the pr-segfault branch from 978ba10 to d7d1fd9 Compare September 7, 2022 11:32

JafarAbdi force-pushed the pr-segfault branch 2 times, most recently from 547ff4e to d7d1fd9 Compare October 12, 2022 16:57

Fix segfault in JTC and simplify checks

8b708cb

JafarAbdi force-pushed the pr-segfault branch from d7d1fd9 to 8b708cb Compare December 7, 2022 21:34

JafarAbdi added 2 commits December 8, 2022 14:11

Add a test to reproduce the bug

b7c7992

Add a test to reproduce the bug

1d6a70a

erickisos approved these changes Dec 16, 2022

View reviewed changes

Merge branch 'master' into pr-segfault

b940773

bmagyar reviewed Dec 16, 2022

View reviewed changes

joint_trajectory_controller/test/test_trajectory_controller.cpp Outdated Show resolved Hide resolved

Apply suggestions from code review

da39f86

bmagyar approved these changes Dec 16, 2022

View reviewed changes

sea-bass mentioned this pull request Jan 12, 2023

Should admittance_controller also depend on kinematics_interface_kdl package? #488

Closed

bmagyar and others added 3 commits January 20, 2023 13:56

Merge branch 'master' into pr-segfault

154bbd1

Merge branch 'master' into pr-segfault

60aa928

Merge reactivation test with the old one and try to make it more real…

5aa6df8

…istic with trajectory execution.

VX792 mentioned this pull request Feb 4, 2023

Fix JTC segfault on unload #515

Merged

mechwiz mentioned this pull request Feb 6, 2023

Fix JTC segfault #518

Merged

JafarAbdi closed this Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix segfault in JTC and simplify checks #423

Fix segfault in JTC and simplify checks #423

JafarAbdi commented Aug 23, 2022

destogl left a comment

destogl Aug 24, 2022

JafarAbdi Aug 24, 2022

bmagyar Dec 16, 2022

VX792 Feb 4, 2023

JafarAbdi Feb 4, 2023

VX792 Feb 4, 2023 •

edited

Loading

JafarAbdi commented Aug 24, 2022

destogl commented Aug 26, 2022

mergify bot commented Sep 7, 2022

codecov-commenter commented Sep 7, 2022 •

edited

Loading

bmagyar left a comment

JafarAbdi commented Dec 16, 2022

destogl commented Jan 24, 2023

mechwiz commented Feb 6, 2023 •

edited

Loading

JafarAbdi commented Feb 7, 2023

Fix segfault in JTC and simplify checks #423

Fix segfault in JTC and simplify checks #423

Conversation

JafarAbdi commented Aug 23, 2022

destogl left a comment

Choose a reason for hiding this comment

destogl Aug 24, 2022

Choose a reason for hiding this comment

JafarAbdi Aug 24, 2022

Choose a reason for hiding this comment

bmagyar Dec 16, 2022

Choose a reason for hiding this comment

VX792 Feb 4, 2023

Choose a reason for hiding this comment

JafarAbdi Feb 4, 2023

Choose a reason for hiding this comment

VX792 Feb 4, 2023 • edited Loading

Choose a reason for hiding this comment

JafarAbdi commented Aug 24, 2022

destogl commented Aug 26, 2022

mergify bot commented Sep 7, 2022

codecov-commenter commented Sep 7, 2022 • edited Loading

Codecov Report

bmagyar left a comment

Choose a reason for hiding this comment

JafarAbdi commented Dec 16, 2022

destogl commented Jan 24, 2023

mechwiz commented Feb 6, 2023 • edited Loading

JafarAbdi commented Feb 7, 2023

VX792 Feb 4, 2023 •

edited

Loading

codecov-commenter commented Sep 7, 2022 •

edited

Loading

mechwiz commented Feb 6, 2023 •

edited

Loading