-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[FEA]: Improve pyarrow integration/IO performance using geoarrow-python #1288
Comments
Hi @paleolimbot! Thanks for submitting this issue - our team has been notified and we'll get back to you as soon as we can! |
Thanks for the feature request. @paleolimbot where is the CRS in the example? |
It's a property of the (Arrow) type! from geoarrow.pyarrow import io
tbl = io.read_pyogrio_table("/vsizip/vsicurl/https://github.com/geoarrow/geoarrow-data/releases/download/v0.1.0/ns-water-basin_point.fgb.zip")
tbl["wkb_geometry"].type.crs
#> '{"$schema":"https://proj.org/schemas/v0.7/projjson.schema.json","type":"Projected... The full serialization of the type is described in the 'extension types' section ( https://github.com/geoarrow/geoarrow/blob/main/extension-types.md ), and you can access the it using |
Hey @paleolimbot ! Thanks for the update. I've been following your geoarrow work for a long while and am pretty excited to integrate it. I wrote a simple wrapper a few months ago before |
Yes! |
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Medium
Please provide a clear description of problem you would like to solve.
Now that geoarrow-pyarrow ( https://github.com/geoarrow/geoarrow-python ) is available and the GeoArrow specification has an initial 0.1 release, there are potential synergies we may be able to leverage given the common memory layout! Basically, geoarrow-pyarrow implements a
pyarrow.DataType
subclass for geometry with a type-level place to store the coordinate reference system. It would be very cool ifcudf.Series.from_arrow()
could handle these (or whatever the best interface is from your end).I also think it has the potential to significantly speed up IO from the current
geopandas.read_file()
+cuspatial.GeoSeries.from_geopandas()
(rough estimate from some musings below assembled linestrings from a large ish FlatGeoBuf about 20x faster).Happy to implement anything in geoarrow-c or geoarrow-python that makes this easier! We're slowly working on getting both on conda-forge (they're on pip already).
Describe any alternatives you have considered
The closest thing that currently provides this functionality is
from_geopandas()
, with Shapely's to_ragged_array and from_ragged_array also providing similar buffer building/parsing capability.Additional context
Some musings with a large-ish linestring dataset (with apologies if I'm missing some obvious usage I should be aware of):
There are more example datasets at https://geoarrow.org/data as well (although I'm sure you have many internally as well).
The text was updated successfully, but these errors were encountered: