-
Notifications
You must be signed in to change notification settings - Fork 127
Add option to use ripgrep
for crawling the list of files
#369
Conversation
ripgrep
for crawling the list of filesripgrep
for crawling the list of files
Small update: I just realized that when using This is quite slow (specially for large repos) so it gives us even greater performance improvements (we can reduce the times on large repos by another 50% 🚀). I'm gonna update the results table with the new data |
I consider this a huge improvement. 😍
Wow! 🏎💨
Thanks for listing these out. This seems promising and very much worth pursuing. ⚡ |
Specially when using `ripgrep`, we don't want to spawn a lot of processes in parallel if the user has many projects opened.
The modified test was relying on the order that two crawlers that are running in parallel start returning results. This may have probably caused flakiness and unintended failures. In order to fix this, I'm forcing the crawling system to run sequentially, by faking that the system has a single CPU.
Since the ripgrep binary should be unpacked from the asar archive (we need to remember to add it to https://github.com/atom/atom/blob/master/script/lib/package-application.js#L101-L111 once we incorporate ripgrep onto Atom), we need to patch the path that gets resolved by the electron built-in require system.
The testing so far seems to be going well:
A couple of drawbacks that I've discovered today:
@jasonrudolph any thoughts about these drawbacks? |
Nice work, @rafeca!
This seems like a worthwhile tradeoff to me. We're trading a relatively tiny bit of additional disk space and bandwidth, and it turn we spend less time waiting for fuzzy-finder results. 👍
We can generate a scopeless API token for our build account and use that token for atom/fuzzy-finder builds and atom/atom builds. Let's coordinate via Slack to make this happen. |
Awesome! I agree it's a worthwhile tradeoff 😃 I've been able to verify that Windows builds are packaged correctly with |
Oh there's something else, this feature is behind a config param which is off by default. Since it's a fuzzy-finder config param, the setting is quite hidden and really hard to discover (you have to go to settings -> packages -> search fuzzy finder and click on its settings. This is how this setting looks like: I have mixed feelings about this:
Couple of things that we could do:
|
Another time we implemented an experimental setting was when we started using Whenever people would open issues noting odd fuzzy finding behavior, we would ask them to try turning on the setting and reporting back. I think we can do the same thing here - keep the setting off by default, but advertise that the setting exists and will become the default in the near future (as we did for tree-sitter). Looking back, what I think we could have done better is set a timeline for when to enable the new scoring method by default. As you can see, it's been three or four years, and I'm still not sure if it's enabled by default yet on all the packages that have it as an option. That was something we handled much better with tree-sitter. I would opt towards making it off by default, advertising it when we can, and determining a target version to have it on by default. If by that target version it seems like it's still not ready for prime time, then we can re-evaluate. |
@50Wliu that sounds good to me. If I understand correctly our release cadence, if this PR lands now it'll get included in v1.37.0 which will be released around May. We could aim to enable it by default by v1.38.0 or v1.39.0 depending on the feedback/issues reported by users (this can be decided later though). |
I love this idea! It doesn't need to be part of this PR, but I think it's worth trying in a follow-up PR. I'd be happy to help. 😄 For what it's worth, I think it will be fairly straightforward to implement using the notification API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work, @rafeca.
I noted one super minor phrasing suggestion below.
I can't wait to start using this. 👏⚡
Awesome! We can implement it as a followup 😄 |
Co-Authored-By: rafeca <rafeca@gmail.com>
Read the release notes, followed the path to this PR, followed your guide where to find the new setting and hit the checkbox. I think there will be more people going down this route :) |
Summary
This PR contains some minimal changes to demonstrate the usage of
ripgrep
on the fuzzy finder, so we can discuss the tradeoffs and steps needed to be able to ship this.This work comes as a follow-up from #367 after the suggestions of @smashwilson .
Benefits
Using
ripgrep
speeds up drastically the time to crawl medium and large repositories (we're taking about up to 14X faster times):(the measurement has been done the same way as in #366).
Possible Drawbacks
The crawling behaviour with
ripgrep
is slightly different than the one currently implemented. I don't think that the changes are important enough (or even bad), but it's important to list them:ripgrep
also returns symlink destination files (e.g if there's a symlink./foo.js
which points to./bar.js
, withripgrep
foo.js
can also be opened. I think this is an improvement.ripgrep
returns all results alphabetically ordered..gitignore
files from the sub-folders are ignored. Withripgrep
they are taken into account. I think that this is an improvement.Regarding 2., this change is quite noticeable, since the fuzzy finder currently displays the first 10 files returned by the crawler when it gets opened. This means that the first shown files will be slightly different than before (IMO we should change the default list of files that are shown when opening the fuzzy finder to show e.g recently opened files or currently opened files).
Alternate Designs
Other potential designs were presented here, and seems like
ripgrep
is the best tool to use at this moment.Regarding the current implementation, I've decided to use
vscode-ripgrep
, which is just a package that handles the download of theripgrep
binaries for several platforms. This has saved me a ton of time, but I'm not familiar enough with the process to install bundled extensions in Atom to know if this package is suitable for the job.Next steps
ripgrep
crawling.ripgrep
works well on different scenarios.ripgrep
binary can be used correctly from a production build (e.g does the asar packaging cause any issue?).Applicable Issues
#271