Skip to content

PoC: add RepositoryPathBuf #1921

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

cruessler
Copy link
Contributor

This is a PoC that adds RepositoryPathBuf in gix_object::tree, for initial use in gix_object::tree::Entry. It is intended to make sure I start making the right changes in the right places.

This PR adds RepositoryPathBuf and uses it for gix_object::tree::Entry::filename. I adapted other places in the code until I got just check passing.

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this started.

Besides my notes, one key aspect of it is that there is a &RepositoryPath that only exists as reference, similar to PathBuf/&Path. An example for this is gix-ref::FileName|&FileNameRef, it needs some unsafe to do the conversion.

Finally, there must be an std::str::from_utf8_unchecked() equivalent for APIs that only have a single component, for example, or do their own validation naturally as part of the parsing.

This leads me to an interesting observation: gix_object::Tree::entries actually only has a RepositoryPathComponent, which might be a type we want explicitly just to say it's definitely not having slashes in there.

And with all this as a start, I think it will come together naturally.

}
}

impl std::ops::Deref for RepositoryPathPuf {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea would be to avoid having easy conversions to BString and treat it as distinct type with its own PathBuf-like API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! This was just a shortcut to get this first addition to compile with the least amount of changes possible (I should have been more clear about the context and the status of this initial commit).

A more general question: is the idea to have both the new types backed by inner: BString (and inner: &BStr) or rather by inner: Vec<u8> (and inner: &[u8])? If the latter, we would need to implement all the necessary functions that are currently provided by BString (and &BStr) ourselves, or am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path of least resistance (and the preferred one) should be to let it be backed by BString|&BStr. Maybe this will also be a great preparation for the time when std::…::ByteString can be used, when only the backing needs to be adjusted as most code is using RepositoryPathComponent|RepositoryPathBuf and friends.

}
}

impl From<BString> for RepositoryPathPuf {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no API that converts to this type without checking the input - paths now have to be relative.

@@ -249,14 +249,57 @@ impl Ord for EntryRef<'_> {
}
}

/// TODO
/// Keep in mind that the path separator always is `/`, independent of the platform.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And on top of that, it's a relative path with only Normal path components. Also it's always represented as bunch of bytes.

@cruessler
Copy link
Contributor Author

I removed Deref for BString and did a couple of follow-up changes to make the code compile.

Is this what you had in mind?

Next, I want to convert the From implementations into TryFrom ones.

@Byron
Copy link
Member

Byron commented Apr 11, 2025

Apologies for the late response, and thanks for pushing this forward.

I removed Deref for BString and did a couple of follow-up changes to make the code compile.

  • I added a method RepositoryPathBuf::to_bstring for use in places where surrounding code expects a BString. Is this a method that we would only use temporarily, so the goal would be to remove it once all the relevant places have been converted to RepositoryPathBuf?

Yes, that seems like the way to go. I didn't check the code and maybe the documentation of to_bstring() says it all, but I'd even go as far as to name it to_bstring_do_not_use() to make clear what it ought to be.

That seems like something one could expect for a bundle-of-bytes path type.

Is this what you had in mind?

Now that I see this I wonder if it should be a bundle of bytes, and as such can deref to bytes so it can be used similarly as Git would do in C.

Next, I want to convert the From implementations into TryFrom ones.

That sounds good!

I didn't look at the code yet, but would do so after your next session. If that feels a little unsafe, please let me know and I will get to it earlier.

@cruessler cruessler force-pushed the introduce-repository-path branch from 2b8f377 to 574d003 Compare April 12, 2025 11:19
@cruessler
Copy link
Contributor Author

Thanks for the feedback! I rebased the PR and made a few more changes. I mainly replaced the From implementations by TryFrom ones. I think I’ve completed all items on my TODO list now, so I’ll wait for your feedback before making other changes.

  • object::tree::iter::lookup_entry::utils returns Option, but uses try_into(). Does it need error handling?
  • There’s now a couple of try_into().expect("TODO") in tests. The tests don’t fail, so it appears these calls could also be replaced by try_into().unwrap() or a more descriptive expect.
  • There’s From<tree::EntryRef<'_>> for tree::Entry in gix-object/src/object/convert.rs that now needs to call try_into(). I got it to compile using try_into().expect("TODO"), but this does not feel like it is supposed to be the final state. Or can we expect try_into to never fail in this method?
  • I needed to add a dependency on gix-fs in gix-object for it to import gix_fs::stack::to_normal_path_components which I wanted to use in the TryFrom implementations.
  • Changing inner: BString to inner: Vec<u8> seems possible. We would need to change the Display implementation so that it converts the underlying bytes to characters, otherwise a couple of tests start failing.

@Byron
Copy link
Member

Byron commented Apr 12, 2025

Thanks so much for continuing this adventure with me :)!

While starting to work on looking at the PR more thoroughly, I realised that there is another thing to consider: Even though users of the API should be able to leverage the type-system, gitoxide itself still should be able to represent all states that Git can read on the lower levels. For instance, gix fsck can see all kinds of garbage and should be able to report it. If it would insta-fail because validation is in the wrong spots, that command couldn't exist.

Once again, I kept it quick, letting your questions drive me.

Thanks for the feedback! I rebased the PR and made a few more changes. I mainly replaced the From implementations by TryFrom ones. I think I’ve completed all items on my TODO list now, so I’ll wait for your feedback before making other changes.

  • object::tree::iter::lookup_entry::utils returns Option, but uses try_into(). Does it need error handling?

It's for tests and and it's related to filename - please see below - so, I'd skip that.

  • There’s now a couple of try_into().expect("TODO") in tests. The tests don’t fail, so it appears these calls could also be replaced by try_into().unwrap() or a more descriptive expect.

I think that's also related to filename in all the cases I saw.

  • There’s From<tree::EntryRef<'_>> for tree::Entry in gix-object/src/object/convert.rs that now needs to call try_into(). I got it to compile using try_into().expect("TODO"), but this does not feel like it is supposed to be the final state. Or can we expect try_into to never fail in this method?

I think filename in Entry* is out of scope. It's not a path, it's a filename, technically a single, slash-free component of a byte path.
One day this could be more strongly typed, but that can definitely happen another day.

  • I needed to add a dependency on gix-fs in gix-object for it to import gix_fs::stack::to_normal_path_components which I wanted to use in the TryFrom implementations.

I always thought that RepositoryPath* is so general that it should be in a lower-level/less-depended-on crate even, gix-path came to mind. Does that solve the issue?

  • Changing inner: BString to inner: Vec<u8> seems possible. We would need to change the Display implementation so that it converts the underlying bytes to characters, otherwise a couple of tests start failing.

I think it's OK to keep it as BString, knowing that one day it would be ByteString.

On another note, I'd turn to_bstring_do_not_use() into into_bstrIng_do_not_use(self)->BString, that way the clone() must be outside, which is preferable here.

And here it comes: Since RepositoryPathBuf is a bag of bytes, it should probably deref to &[u8] to easily plug into existing code. However, code that meddles with / should be reviewed as from there one might be able to offer more convenient APIs, sings like push_component or something like that.
It's notable that \ are perfectly valid within such a RepositoryPaths, they just aren't path-separators. The type is meant to be mostly an indicator of what's needed with some type-safety added.

In a way, ToNormalPathComponents is already the trait that makes inputting such paths most flexible. In case of RepositoryPath* it's actually infallible, something that's not representable by the trate yet and that is a pain to do.

So (at least) one question remains: where would the user get these paths? Entry::filename it is not, so one obvious place comes to mind: It's the gix_index::State. There, however, it should be made so that it can still return unvalidated paths, and validated ones. As far as I know it doesn't validate the paths it reads, which is the way to go, but the user should have a hard(er) time to retrieve such unvalidated paths, and an easy time to get validated RepositoryPath*s.

So in short, I think what should be done is to leave filename alone for now, move RepositoryPathBuf into gix-path, and then see where RepositoryPathBuf can actually be returned, gix-index comes to mind.

Does that make any sense?

@cruessler
Copy link
Contributor Author

cruessler commented Apr 13, 2025

So (at least) one question remains: where would the user get these paths? Entry::filename it is not, so one obvious place comes to mind: It's the gix_index::State . There, however, it should be made so that it can still return unvalidated paths, and validated ones. As far as I know it doesn't validate the paths it reads, which is the way to go, but the user should have a hard(er) time to retrieve such unvalidated paths, and an easy time to get validated RepositoryPath* s.

Just to make sure I understand correctly: are you suggesting we change path() -> &BStr to path() -> Result<&RepositoryPath> in the following code, adding something like path_unvalidated() -> &BStr along the way?

/// Return an entry's path, relative to the repository, which is extracted from its owning `state`.
pub fn path<'a>(&self, state: &'a State) -> &'a BStr {
state.path_backing[self.path.clone()].as_bstr()
}
/// Return an entry's path using the given `backing`.
pub fn path_in<'backing>(&self, backing: &'backing crate::PathStorageRef) -> &'backing BStr {
backing[self.path.clone()].as_bstr()
}

Also, I discovered there is gix_validate::path::component(). I was wondering whether that’s what should be used to validate path components for RepositoryPath (or even for ToNormalPathComponents).

Update: I’ve updated the first link that now correctly points to gix_index::Entry.

@Byron
Copy link
Member

Byron commented Apr 13, 2025

Just to make sure I understand correctly: are you suggesting we change path() -> &BStr to path() -> Result<&RepositoryPath> in the following code, adding something like path_unvalidated() -> &BStr along the way?

I don't know exactly which path() you are referring to, but if it would return a RepositoryPath in nature, it should now return the new type. And indeed, path_raw() would probably be the name for getting the &BStr version of it.

Also, I discovered there is gix_validate::path::component(). I was wondering whether that’s what should be used to validate path components for RepositoryPath (or even for ToNormalPathComponents).

That seems like the way to go to validate components in a RepositoryPath.
However, I wouldn't use it for ToNormalPathComponents just yet as there we really only need normal components to build an actual filesystem path. I acknowledge that this might change one day, but I don't think we have to do that here. Ideally, this PR makes a case for where RepositoryPath|Buf can genuinely improve the way certain APIs are used, and improve (type)safety.

@cruessler
Copy link
Contributor Author

cruessler commented Apr 13, 2025

I don't know exactly which path() you are referring to, but if it would return a RepositoryPath in nature, it should now return the new type. And indeed, path_raw() would probably be the name for getting the &BStr version of it.

I’m sorry, I included an incorrect link in my response, this is the correct one:

/// Return an entry's path, relative to the repository, which is extracted from its owning `state`.
pub fn path<'a>(&self, state: &'a State) -> &'a BStr {
state.path_backing[self.path.clone()].as_bstr()
}
/// Return an entry's path using the given `backing`.
pub fn path_in<'backing>(&self, backing: &'backing crate::PathStorageRef) -> &'backing BStr {
backing[self.path.clone()].as_bstr()
}

@Byron
Copy link
Member

Byron commented Apr 13, 2025

Yes, that's where I would have expected methods just like you indicate. Just that these days I try to stick more to _raw as a suffix because git2 does it. However… since this is more relevant for security, calling it *_unvalidated() is probably better.

@Byron
Copy link
Member

Byron commented Apr 13, 2025

Another, and probably more interesting usage of &RepositoryPath would be in the gix-ref crate (see this PR #1936) - there one can do a prefixed listing of refs, which must be a relative path.

This one is interesting as it's not relative to the worktree, but relative to the repository directory .git. Also, it can contain backslashes, but I think it's fair to not allow them for consistency, which would make RepositoryPath applicable.
If one wanted to do that, one could also do gix_fs::RelativePath, with the "must be relative to the workspace directory"-requirement dropped.

@cruessler cruessler force-pushed the introduce-repository-path branch from 71c0092 to 4a016f1 Compare April 14, 2025 13:09
@cruessler
Copy link
Contributor Author

I added gix_path::relative_path::RelativePath as a draft to make sure this is going in the right direction. Should this have been RepositoryPath?

  • gix::open::repository::ThreadSafeRepository::open_from_paths had as_bstr() which I had to turn into clone().try_into().expect(). This would need to be changed, I suppose.
  • I turned impl Into<Cow<'a, BStr>> into &'a RelativePath in a couple of places, mostly ones that had TODOs next to them.
  • I drafted one TryFrom implementation and included validation using gix_validate::path::component. There’s still a todo() left: would we need to use transmute for the conversion?
  • gix_ref::store_impl::file::overlay_iter::IterInfo::from_prefix runs checks on prefix_path that seem like they could be deleted, given that RelativePath is supposed to already enforce the constraints being checked. Is that correct?
  • gix_ref::store_impl::file::overlay_iter::Store::iter_prefixed_packed returns std::io::Result, but calls try_into to get a RelativePath. try_into does not return std::io::Result, though, so we would need to use map_err or make other changes.

I have not yet removed the older commits some of which might be obsolete by now. I want to to that once this PR is closer to its final version.

@Byron
Copy link
Member

Byron commented Apr 14, 2025

Should this have been RepositoryPath?

I think it's better to call it RelativePath - later it could just be a pub type RepositoryPath = RelativePath to make a point. Maybe… RepositoryPath means it's always slash separated, which isn't quite as much implied if one says RelativePath, but… it's easy to change later which might happen even.

  • gix::open::repository::ThreadSafeRepository::open_from_paths had as_bstr() which I had to turn into clone().try_into().expect(). This would need to be changed, I suppose.

I think so, too.

  • I turned impl Into<Cow<'a, BStr>> into &'a RelativePath in a couple of places, mostly ones that had TODOs next to them.

I think that's the way. In the non-plumbing crate, gix, this would have to become impl TryInto something something (similar to what's done with FullName|PartialName for references. It's a pain to implement this, actually, and maybe a trait would be better for that just to deal with error handling more easily.

  • I drafted one TryFrom implementation and included validation using gix_validate::path::component. There’s still a todo() left: would we need to use transmute for the conversion?

From something Owned to a reference won't work. But one can go from BStr to RelativePath.

  • gix_ref::store_impl::file::overlay_iter::IterInfo::from_prefix runs checks on prefix_path that seem like they could be deleted, given that RelativePath is supposed to already enforce the constraints being checked. Is that correct?

Yes, that duplication of validation can then be removed.

  • gix_ref::store_impl::file::overlay_iter::Store::iter_prefixed_packed returns std::io::Result, but calls try_into to get a RelativePath. try_into does not return std::io::Result, though, so we would need to use map_err or make other changes.

It returns std::io::Result for convenience or because there is nothing else that it returned before. Fortunately, std::io::Error::other(err) can take anything.

I hope that helps.

@cruessler
Copy link
Contributor Author

I hope we’re getting closer. 😄

  • I turned the clone().try_into().expect() in gix::open::repository::ThreadSafeRepository::open_from_paths into as_bstr().try_into().ok()? because it’s inside a closure that returns an Option. Does that work?
  • I had to turn forbid[unsafe_code] into deny[unsafe_code] for RepositoryPath::new_unchecked to have an unsafe block. I have never written unsafe code before, so I hope this is correct.
  • gix_validate::path::component does not seem to check whether input is a normal component, so we would need to also check in all try_from implementations. Is that correct? If it is, we can’t seem to use gix_fs::stack::ToNormalPathComponents because gix-fs depends on gix-path, not the other way around. So I would lean towards having if !matches(component, Component::Normal) in all try_from implementations instead.

@Byron
Copy link
Member

Byron commented Apr 14, 2025

I hope we’re getting closer. 😄

It sure looks like it 🔥🙌.

  • I turned the clone().try_into().expect() in gix::open::repository::ThreadSafeRepository::open_from_paths into as_bstr().try_into().ok()? because it’s inside a closure that returns an Option. Does that work?

There Git would have to decide - does it error if the path (as configured) is absolute?

  • I had to turn forbid[unsafe_code] into deny[unsafe_code] for RepositoryPath::new_unchecked to have an unsafe block. I have never written unsafe code before, so I hope this is correct.

This is the way 👋.

  • gix_validate::path::component does not seem to check whether input is a normal component, so we would need to also check in all try_from implementations. Is that correct? If it is, we can’t seem to use gix_fs::stack::ToNormalPathComponents because gix-fs depends on gix-path, not the other way around. So I would lean towards having if !matches(component, Component::Normal) in all try_from implementations instead.

Actually, that's a great catch and probably a Bug. I really am trying to make progress in #1825 and will push a fix there. Right now, . and .. are forbidden only when special Windows paths are disallowed, but I think this is wrong and these should always be disallowed as they aren't ever a valid path component for trees.

@cruessler
Copy link
Contributor Author

Actually, that's a great catch and probably a Bug. I really am trying to make progress in #1825 and will push a fix there. Right now, . and .. are forbidden only when special Windows paths are disallowed, but I think this is wrong and these should always be disallowed as they aren't ever a valid path component for trees.

Awesome! I’ll rebase once the fix is available.

@cruessler
Copy link
Contributor Author

  • I turned the clone().try_into().expect() in gix::open::repository::ThreadSafeRepository::open_from_paths into as_bstr().try_into().ok()? because it’s inside a closure that returns an Option. Does that work?

There Git would have to decide - does it error if the path (as configured) is absolute?

Git errors, so I’ll turn this into an error:

❯ env GIT_REPLACE_REF_BASE=/refs/rebase git log
BUG: refs.c:540: pattern must not start with '/'

Relevant context:

@cruessler
Copy link
Contributor Author

  • I’ve turned a non-relative prefix into an error in open_from_paths. The control flow in the changed section of open_from_paths is a bit nested now, but I didn’t find a way to simplify.
  • RelativePath::new_unchecked accepts &BStr, same as Full|PartialNameRef::new_unchecked. I was wondering whether it should accept &[u8] or whether to introduce from_bytes after the model of BStr or from_bytes_unchecked after oid.

Now, I’m waiting for #1825 to be merged. In the meantime, I’ll keep this PR up to date and try to get CI to pass.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants