Skip to content

Feature/249 support geoparquet #254

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

nathanjmcdougall
Copy link
Contributor

@nathanjmcdougall nathanjmcdougall commented Jul 19, 2024

To resolve #249.

I am developing on Windows so adding a dependency and updating the requirements file is a little cumbersome.

The way I approached this is to use uv with

uv pip compile setup.cfg --extra=doc --extra=test --python-platform=linux --output-file=requirements/dev.txt --unsafe-package=pip --unsafe-package=setuptools

And then I did some massaging of the diff to remove some unnecessary stylistic changes introduced by uv.
Also pip-compile ignores platform specifiers, so uv didn't include appnope==0.1.4 (it is a Darwin-only dependency of ipykernel), but I manually added it back to minimize the diff.

This might be an argument in favour of switching to uv pip compile over pip-compile but it also might be an argument in favour of me developing on Linux 👀

@nathanjmcdougall
Copy link
Contributor Author

The pipeline failed but it seems just to be an intermittent networking issue, from the raw logs
The test pins.boards.BoardManual which uses a GitHub-hosted dataset is returning this error:

requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /repos/rstudio/pins-python

@@ -206,10 +234,20 @@ def default_title(obj, name):
import pandas as pd

if isinstance(obj, pd.DataFrame):
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might play really nicely with the changes you've made in #263 👀 What if we merge that PR first and then refactor geopandas dataframes to be part of _get_df_family?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree.

# TODO(compat): title says CSV rather than data.frame
# see https://github.com/machow/pins-python/issues/5
shape_str = " x ".join(map(str, obj.shape))
return f"{name}: a pinned {shape_str} DataFrame"
return f"{name}: a pinned {shape_str} {obj_name}"
Copy link
Collaborator

@isabelizimm isabelizimm Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, as someone who is not a geopandas frequent flyer-- is it important to you to denote that it is a geopandas dataframe, rather than just "DataFrame"? I partially ask since we won't specify between pandas/polars (although I do realize that is mostly due to the fact polars dataframes do not round-trip)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I think GeoDataFrame would be more informative in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GeoDataFrames contain extra metadata such as a coordinate system, an assigned geometry column, and in general "feel" quite different to a standard DataFrame. The naming convention reflects this: using _gdf in variable names instead of _df.

@nathanjmcdougall nathanjmcdougall force-pushed the feature/249-support-geoparquet branch from 701248d to da71966 Compare July 25, 2024 08:06
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support GeoParquet
2 participants