-
-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Check if file is ignored according to git #98
Conversation
doesn't launch the job if the file is ignored (this should be guarded by a configuration parameter)
To test this and have more insight about the exclusion process, launch it with info level log:
|
in the job configuration
src/ignorer.rs
Outdated
let index = worktree.index()?; | ||
|
||
// there doesn't seem to be any public API for looking at "excludes" without caching | ||
// so we create a cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cache is probably a misnomer here as it doesn't cache anything - it merely builds state to be able to perform these lookups and be fast with non-random input ordered according to the git index.
I think I ran out of words as it contains an ignore stack
internally, and this is adding even more information to bring everything together.
|
||
/// Tell whether the given path is excluded according to | ||
/// either the global gitignore rules or the ones of the repository | ||
pub fn excludes(&mut self, file_path: &Path) -> Result<bool> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think as it stands, each path will trigger reading all gitignore files. They are indeed held in the excludes()
data structure and ideally it is kept. I see how this isn't possible right now, and believe that the current reference is likely a premature optimization rather than a necessity. This will change for sure - and it's done in the latest main, which should greatly improve performance as the cache can actually reuse state if it's kept around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current reference is likely a premature optimization rather than a necessity
I feel you. I fell in the same trap in my first libs too, and it's a pain to fix.
TBH checking whether a file is excluded takes less than 1 ms right now and that's fine for bacon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will think about this - the number one usage of lifetimes in structs is platforms, they add some data around an operation and perform it, keeping a reference to their originating Repository
. As long as these are basically free, I think they can be created on demand with the Repository
being cloned to where it is needed - that's the intent.
If they aren't free though, like the Cache
here, I think it's good advice to rather clone the Repository
into it to make it standalone, or do whatever else it takes. I think some useful rule emerges from this experience and I will put it into words in DEVELOPMENT.md
to make it official.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW @Byron I don't know if you may find this interesting, but I've also implemented a gitignoring stack (stacking the parsed gitignore files as I imagine you do): https://github.com/Canop/broot/blob/main/src/git/ignore.rs#L155
This was done for broot with very specific performance concerns (breathfirst tree diving).
The reasons I didn't take that for bacon were
- I didn't want to implement myself looking for the global gitignore rules (I use git2 in broot for that)
- I wanted to try gitoxide for other programs of mine (and maybe replacing git2 with gitoxide in my current programs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing! I love the perceived simplicity of the git-ignore implementation, it fits in 240 lines after all :)!
With the Cache
type unchained from the lifetime in main
you would now be able to reuse it for each lookup and that should yield much better performance.
From a correctness point of view, it's probably (hopefully) a good idea to use gitoxide
even if the performance is just similar, as I tried my best to validate the implementation against git with many many test cases. Of course I hope you won't take my word for it and validate it yourself, gitoxide
strives to yield the same results as git
.
Thus, I hope you will end up using gitoxide
in more of your projects and if there is anything preventing that, I'd love to know to get a chance to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gitignoring is only part of using git, and I definitely don't want to expand into building a general git crate while there's already an ambitious project, so it's my clear intention to try and use gitoxide ;)
Related to #675 and Canop/bacon#98.
Doesn't launch the job if the file is ignored
There's a configuration parameter at the job level to disable this filtering.
Fix #32