Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

geodataframe parquet storage exception #202

Open
tgrandje opened this issue Aug 6, 2024 · 5 comments
Open

geodataframe parquet storage exception #202

tgrandje opened this issue Aug 6, 2024 · 5 comments
Assignees

Comments

@tgrandje
Copy link
Collaborator

tgrandje commented Aug 6, 2024

Hello @hadrilec @tfardet

There seems to be a bug on the parquet storage for geodataframes:
geodataframes are not rightly handled by pandas/pyarrow.

It seems a simple try/except and geopandas overlay might do the trick. I'll check and propose a PR.

Sample code:

from pynsee.geodata import get_geodata
cities = get_geodata("ADMINEXPRESS-COG-CARTO.LATEST:commune")

The parquet file will not be written in the cache dir, even if pynsee says it was saved.

The warning was also not displayed and the culprit seems to be here. Do any of you remember why this was added in the first place?

(Sorry I'm not present on pynsee anymore, I've had too much on my plate. Hopefully it will be better before the end of this year)

@tgrandje tgrandje self-assigned this Aug 6, 2024
@tgrandje
Copy link
Collaborator Author

tgrandje commented Aug 6, 2024

This seems to be a clear use case for adding geopandas as a dependency : #169

@tfardet
Copy link
Collaborator

tfardet commented Aug 6, 2024

This warning filter was before my time, but indeed, I just noticed the issue just today as well ^^
I'm also obviously 100% in favor of adding geopandas as dependency!

@hadrilec
Copy link
Contributor

hadrilec commented Aug 7, 2024

I would like to avoid to have geopandas as a dependency because I struggled once to install this package on my laptop. If I remember correctly, this package comes with special external dependencies. I dont see why it is absolutely needed to include it as a dependency.

@tgrandje
Copy link
Collaborator Author

tgrandje commented Aug 7, 2024

Was this recent? I don't recall having any trouble recently. At least not since Christoph Gohlke's alternative repo with built whl went (almost) down and the team behind Fiona was forced to compile the whl for windows directly on pypi... (And believe me, I was the first to struggle with issues on windows server at that time.)

I'd say the compromise would be to declare geopandas as an optional dependency. If geopandas is present, then we can write and read geoparquets files. If not, we resort to the "old" pickle way and set a warning to the user.

@tfardet
Copy link
Collaborator

tfardet commented Aug 7, 2024

I would like to avoid to have geopandas as a dependency because I struggled once to install this package on my laptop

As I've said before, I also don't think this is relevant anymore since geopandas has been using wheels for a long time.

I believe anyone who receives geodata in python as something that looks like a dataframe would (rightfully) expect a GeoDataFrame and handing them something that is not and does not even behave like one is not a good idea.

EDIT: if the GeoFrDataFrame is also the cause for the geoparquet issue, that's also another very good reason to get rid of it

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants