-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Add new initial command stage to mutex all reads of the installspace. #391
Conversation
7436500
to
e9eb801
Compare
|
||
for env_loader_path in env_loader_paths: | ||
if logger: | ||
logger.out('Loading environment from: {}'.format(env_loader_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logger.out
could just be dropped if we don't like it; the conditional is because of the print_build_env
invocation, which doesn't have a logger to pass it.
The code changes and approach seem ok to me, though I share your concern about the performance impact. I wonder how the impact on performance changes as the workspace size increases. I'd like a once over by @jbohren before merging this though, as he's more familiar with the env loading code. |
The perf impact will be proportional to parallelism rather than number of packages, I think, since the impact will be having a bunch of packages which all need to start at once having to take turns using the installspace to set up their environment. That said, if there's any measurable impact as a result of this, it's us you can expect a fix from. :) Edit to add: Would like to get this merged and released as soon as possible, but as of this morning we're running it from source in production— will report back if there are any anomalies with it. |
We've been running this for the last couple days for our bundle builds, and the spurious failures have completely disappeared. I have a report from a ros-install-osx user who hit the problem on a multi-core Mac build, so I am keen to see this merged and released. I'd appreciate input particularly on the business of running the env setup for @jbohren Have you had a chance to look at this at all? |
This better aligns the function name with other similar helpers in catkin_tools.jobs.utils.
b9f72d9
to
9cf85dd
Compare
@wjwwood @jbohren Would be delighted for you to look at this soon so that we can get this merged and released— it does resolve an actual bug in the wild which affects large parallel builds. I have now renamed the new function to loadenv, made the FunctionStage name match, and also dropped this stage from clean jobs. I've tested with building, partially and fully cleaning, and re-building workspaces using this change, and have had no problems. This simplifies the overall diff and avoids some superstitious functionality. |
9cf85dd
to
22d2c82
Compare
Small fix at the behest of flake8; got a clean build now. |
ping |
I haven't looked at the implementation in detail, but I have been running with this patch on my dev machine for two weeks without any issues. So +1 from me. |
ping |
1 similar comment
ping |
I'm sorry @mikepurvis, I'm not likely to have time to review and merge this until after ROSCon. I'll do my best to have a look at it soon. Also, I don't think @jbohren ever got around to skimming it. |
Just wanting to make sure it doesn't fall off the radar altogether. It's pretty inconvenient for us not having these changes in the native I could just release a fork under a different name, or patch in the fix with quilt, but those are their own headaches. And I'd like to continue making contributions upstream, which will become considerably harder to justify if I'm also maintaining a fork. |
Sorry @mikepurvis, I'm trying to balance my time between this, rviz, and robot_model-ish stuff. I've recently spent some time on the others, but I haven't had a slice to spend on catkin tools yet. Hopefully I can do that this week. We still don't have any feedback from @jbohren on this. |
For anyone else interested, I've set up a PPA and released a version of https://launchpad.net/~mikepurvis/+archive/ubuntu/catkin Process was:
|
@wjwwood ping |
Since 0.4.3 got released (b538dea) without this fix included, I've updated my PPA to a new patch release: https://launchpad.net/~mikepurvis/+archive/ubuntu/catkin/+packages |
Ok, I've just tested this for a few hours today myself. Given a second code review from me (+1) and @mikepurvis's long term usage of catkin tools with this patch, I think I'm going to merge this now, without feedback from @jbohren. Sorry it took so long @mikepurvis. I'll try to avoid delays like this in the future. I'd also like to look into sharing ownership with others in the community so things don't get hung up waiting on the maintainers again (currently me and @jbohren have push rights). I want to try and include some more changes before doing a release, but I will avoid delaying too long before making one. |
Just released |
Thanks @mikepurvis ! |
I am not sure if this is the correct place to ask this, but I still get this error.
I am not really at home with CMake, so I have no idea where I would have to start fixing this. Could someone point me in the right direction? Thanks |
@emielsteerneman you might want to make a new issue since this cannot be reopened (as pull requests cannot be opened after merging). Also, you should include more information, like which version of ROS you're using and what version of catkin tools you're using. |
Thanks for your response Wjwwood, now I'll know what to next time I run into such an issue. Fortunately, I managed to fix my current issue. My editor didn't source all the required files, such as setup.bash. |
Hi! I'm trying to get Clion working with ROS, so I can step through code, and I'm getting the same error you were getting. Do you remember how/where you changed your editor to source setup.bash and other files? Thanks! |
Hey rarestg! I did not modify the editor. The trick is to open CLion via the terminal. It is explained here : https://www.comkieffer.com/clion-ros.html . Good luck! |
Old thread but had this error too ("No module named catkin.environment_cache") with CLion and full remote development (clion on mac, target on linux-vm). Solution was to set CMAKE_PREFIX_PATH in CMakeLists.txt:
|
@FransGH so crazy,it helps a lot |
This is the solution to #378 proposed in #378 (comment).
There's probably a small perf hit here which could be mitigated by going to a readers-writer lock, but it doesn't seem Python has one natively (example implementation), and the cost of switching the locked_resource scheme to support such a thing would increase complexity. Given how fast the loadenv stage is relative to the rest of the build, I don't feel it's worth it.
@jbohren @wjwwood
My validation methodology for this is as described in #378 (comment), with one small change— I've duplicated the test workspace and am running two instances of it in parallel, in the hopes of increasing overall entropy on the box and triggering problems.
As of July 24th, I've built ros_base ~2000 times across the two workspaces, with no failed builds. Previous tests with catkin_tools 0.4.2 get failures of the ros_base workspace every 100-200 builds (we see them much more frequently than this on our bundle builds, but those are much larger and more heavily parallelized than the ros_base workspace is).
Upon switching back to the master branch (4f5fa9d), I got the following failures in my two test ros_base workspaces. The error outputs below are manifestations of the underlying concurrency problem which this PR corrects.
Run 11 in the first workspace:
Run 107 in the second workspace:
Next failure on the 22nd run:
Then run 59: