Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

File/Dir - cannot use path made of non-utf8 bytestrings #79

Open
gabriel-v opened this issue Aug 18, 2023 · 0 comments
Open

File/Dir - cannot use path made of non-utf8 bytestrings #79

gabriel-v opened this issue Aug 18, 2023 · 0 comments

Comments

@gabriel-v
Copy link

gabriel-v commented Aug 18, 2023

All the code in redun.File and friends assumes we have a single valid utf-8 string for the path.

But python accepts bytes as the path objects too. This is needed when we're working with filesystems that encode filenames using something else than UTF-8.

There's some functions that crash when trying to give File a bytes path:

  • File: get_filesystem_class() - get_proto() - the urlparse method fails on non-utf8 byte strings
  • Dir: all of the above, and also concatenating the glob pattern - complains that TypeError: Can't mix strings and bytes in path components

The workaround is to hack:

I also tried changing the self.classes.File but it can't be overwritten (uses getitem) - so one would have to replace this whole FileClasses thing.

I think one of two things can be done here:

  • either fix File, Dir and friends to work with non-uft8 bytestrings paths
  • or, allow the user of the library to override the FileClasses, get_filesystem_class and friends, without so much monkeypatching
  • refactor the whole thing to only use pathlib.Path as requested in Use pathlib.Path instead of strings for path #8
    • through I think the get_proto() and urlparse would still crash when given non-utf8 bytestrings

What do you think?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

No branches or pull requests

1 participant