Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Frictionless package cannot understand windows folder delimiters on MacOS #2

Open
larsyencken opened this issue Dec 1, 2021 · 7 comments

Comments

@larsyencken
Copy link

Hey @semio o/

If you iterate over the package with frictionless on MacOS, with code like this:

import frictionless

p = frictionless.Package('datapackage.json')
for resource in p.resources:
    df = resource.to_pandas()

You get a result like this:

FrictionlessException: [scheme-error] The data source could not be successfully loaded: [Errno 2] No such file or directory: 'income_mountain\\ddf--datapoints--income_mountain_50bracket_shape_for_log--by--country--year.csv'

It looks like the windows path delimiter fails for unix systems. I don't have a windows machine, but if you have a few moments could you check whether using forward slashes (income_mountain/...) gets translated by Frictionless for windows?

If not, then it's probably a problem in the frictionless spec.

@larsyencken
Copy link
Author

We can always work around it if need be just by replacing the delimiters as we go, so it's not a blocker on our side.

@larsyencken larsyencken changed the title Frictionless package cannot understand folder delimiters Frictionless package cannot understand windows folder delimiters on MacOS Dec 1, 2021
@semio
Copy link
Contributor

semio commented Dec 2, 2021

Hi Lars, I was trying to use our tools which are supposed to be working cross platform in a Windows environment. And apparently they are not cross platform enough, thanks for reporting this issue :)

I did some googling and found this article about path in python: https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f

Python has a hack where it will recognize either kind of slash when you call open() on Windows...
And Python’s support for mixing slash types is a Windows-only hack that doesn’t work in reverse. Using backslashes in code will totally fail on a Mac

So for me I think it's better to produce file paths in unix style. I will change it in our tools

Regarding frictionless library, I can't even do import frictionless under windows. I am not sure if it's problem in my environment setup or it's issue in frictionless, I will do some more experiment a bit later and keep you updated.

@semio
Copy link
Contributor

semio commented Dec 2, 2021

The error when I do import frictionless. I don't have any setting files at all as this is the first time I install it

~\Anaconda3\envs\gapminder\lib\site-packages\frictionless\helpers.py in <module>
     20 from urllib.parse import urlparse, parse_qs
     21 from _thread import RLock  # type: ignore
---> 22 from . import settings
     23
     24

~\Anaconda3\envs\gapminder\lib\site-packages\frictionless\settings.py in <module>
     24 REPORT_PROFILE = json.loads(read_asset("profiles", "report.json"))
     25 STATUS_PROFILE = json.loads(read_asset("profiles", "status.json"))
---> 26 SCHEMA_PROFILE = json.loads(read_asset("profiles", "schema", "general.json"))
     27 RESOURCE_PROFILE = json.loads(read_asset("profiles", "resource", "general.json"))
     28 TABULAR_RESOURCE_PROFILE = json.loads(read_asset("profiles", "resource", "tabular.json"))

~\Anaconda3\envs\gapminder\lib\site-packages\frictionless\settings.py in read_asset(*paths)
     11     dirname = os.path.dirname(__file__)
     12     with open(os.path.join(dirname, "assets", *paths)) as file:
---> 13         return file.read().strip()
     14
     15

UnicodeDecodeError: 'gbk' codec can't decode byte 0xac in position 7809: illegal multibyte sequence

semio added a commit that referenced this issue Dec 2, 2021
@larsyencken
Copy link
Author

Weird, I couldn't spot any reported bugs in their python package that match this.

If you wanted, we could add a github action that checks for each repo that the frictionless data reads smoothly.

@semio
Copy link
Contributor

semio commented Dec 5, 2021

@larsyencken I figured it out. The "general.json" is included in frictionless package. In my Windows system, python try to read this file with wrong encoding.

According to python doc, the default encoding for open() is platform dependent. UTF-8 mode is the default on Linux but it's not on Windows. I need to add PYTHONUTF8 environment variable to ensure python use utf8 mode and finally I can import the package.

And I tried to load this datapackage with frictionless, it worked so frictionless does work with forward slashes on Windows.

@larsyencken
Copy link
Author

Good find! That's definitely a frictionless bug though, they should explicitly pick UTF-8 I think.

@semio
Copy link
Contributor

semio commented Dec 24, 2021

Yep, here it is: frictionlessdata/frictionless-py#962

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants