Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Write columns with "short" type in gdb with OpenFileGDB driver #1321

Closed
remi-braun opened this issue Jan 10, 2024 · 7 comments · Fixed by #1358
Closed

Write columns with "short" type in gdb with OpenFileGDB driver #1321

remi-braun opened this issue Jan 10, 2024 · 7 comments · Fixed by #1358
Assignees
Milestone

Comments

@remi-braun
Copy link

Hello,

I am currently struggling to write a column with a short type in a GDB.
After some research, I found a way to write shapefiles with a difference between long a short integers with this solution (which works).

Expected behavior and actual behavior.

Both 'int32:4' and 'int32:10' types in schema give a long column in a GDB, instead of giving a short and a long column.

Steps to reproduce the problem.

Code

import geopandas as gpd
import fiona

gdb_path = "my_gdb.gdb"
layer = "B1_observed_event_a"
observed_event = gpd.read_file(gdb_path , layer=layer)
with fiona.collection(gdb_path, layer=layer) as source:
     s = source.schema

s["properties"]["obj_desc"] = "int32:4"
s["properties"]["det_method"] = "int32:10"

observed_event.to_file("my_gdb_copy.gdb", layer=layer, driver="OpenFileGDB", schema=s)

PS: I know I am using geopandas, but they implement seamlessly the Fiona schemas, this is why I am iting the issue here.

Output

2024-01-10_11h52_50

Operating system

Windows

Fiona and GDAL version and provenance

Conda (conda-forge)

@sgillies
Copy link
Member

@remi-braun Happy new year! OGR doesn't have a short integer type, only 32 and 64-bit integers. Neither does Fiona at this time, thus your layers are being constructed with 32 bit wide integer fields. I don't think there is any logic in the GDB driver to reduce the width at creation time.

Do you see different behavior if you use ogr2ogr or pyogrio?

@sgillies sgillies self-assigned this Jan 10, 2024
@remi-braun
Copy link
Author

Happy new year to you too 😉

I think pyogrio doesn't handle schemas, so I haven't tried.
2024-01-11_08h37_44

What's weird is that for ESRI a short isn't an int16 but also an int32, but I don't exactly know what means the end of int32:4.
Note that text:255 works for GDB, so the : mechanism is in some way already handled in OpenFileGDB.

And we made it all work for Shapefiles, so other drivers handle this mechanism.

@jorisvandenbossche
Copy link
Member

I think pyogrio doesn't handle schemas, so I haven't tried.

Pyogrio doesn't support the schema keyword (as that is a fiona specific parameter), but it certainly does support writing the different data types. But because the input for pyogrio is a geopandas DataFrame or numpy arrays, the data already has a schema, and pyogrio uses that (instead of letting the user specify it separately).
So if you ensure that your input data has an int16 column, pyogrio should pass that information through to GDAL:

import geopandas
from shapely.geometry import Point

gdf = geopandas.GeoDataFrame({"col": np.array([1, 2, 3], dtype="int16"), "geometry": [Point(i, i) for i in range(3)]})
gdf
#    col     geometry
# 0    1  POINT (0 0)
# 1    2  POINT (1 1)
# 2    3  POINT (2 2)

gdf.to_file("test_gdb.gdb", driver="OpenFileGDB", engine="pyogrio")

I am not fully sure how then to check independently whether it has actually written the correct data type to the OpenFileGDB, but ogrinfo indicates that it has:

$ ogrinfo test_gdb.gdb test_gdb
INFO: Open of `test_gdb.gdb'
      using driver `OpenFileGDB' successful.

Layer name: test_gdb
Geometry: Point
Feature Count: 3
Extent: (0.000000, 0.000000) - (2.000000, 2.000000)
Layer SRS WKT:
(unknown)
FID Column = OBJECTID
Geometry Column = SHAPE
col: Integer(Int16) (0.0)
OGRFeature(test_gdb):1
  col (Integer(Int16)) = 1
  POINT (0 0)

OGRFeature(test_gdb):2
  col (Integer(Int16)) = 2
  POINT (1 1)

OGRFeature(test_gdb):3
  col (Integer(Int16)) = 3
  POINT (2 2)

@jorisvandenbossche
Copy link
Member

And if you want to control the exact OpenFIleGDB types being used by GDAL, it seems to have a creation option COLUMN_TYPES that can be passed (see https://gdal.org/drivers/vector/openfilegdb.html#layer-creation-options, but didn't try this)


@sgillies OGR indeed only uses int32 or int64 data in its internal data model, but there is the concept of "sub type" to annotate a type with additional information (I assume it doesn't change how the data is represented internally, still int32, but then it is used as a hint when writing): https://gdal.org/api/vector_c_api.html#_CPPv415OGRFieldSubType, and there is has a OFSTInt16.
Pyogrio uses this when the input data has a bitwidth < 32, and based on the example above, it seems to have effect. Fiona could use this as well. It's already declared:

Fiona/fiona/ogrext3.pxd

Lines 102 to 105 in 195579d

ctypedef int OGRFieldSubType
cdef int OFSTNone = 0
cdef int OFSTBoolean = 1
cdef int OFSTInt16 = 2

and could set it like is already done for bool subtype as well:

Fiona/fiona/ogrext.pyx

Lines 1293 to 1295 in 195579d

if value == 'bool':
value = 'int32'
field_subtype = OFSTBoolean

@remi-braun
Copy link
Author

remi-braun commented Jan 11, 2024

With pyogrio, I successfully wrote short dtypes!
However, geopandas doesn't read corretly the input type, so I had to change the type of every column (which could be time consuming):

import geopandas as gpd
gdb_path = "my_gdb.gdb"
layer = "B1_observed_event_a"

# Read layer
observed_event = gpd.read_file(gdb_path , layer=layer)

# Set correct types
observed_event.event_type = observed_event.event_type.astype("int32")
observed_event.obj_desc = observed_event.obj_desc.astype("int16")
observed_event.notation = observed_event.notation.astype("str")  # How can I set str:255 ?
observed_event.det_method = observed_event.det_method.astype("int16")
observed_event.dmg_src_id = observed_event.dmg_src_id.astype("int32")

# Write back in gdb
observed_event.to_file("my_gdb_copy.gdb", layer=layer, driver="OpenFileGDB", engine="pyogrio")

However, with fiona's schema, I succeeded to set str:255 as field, but not with pyogrio. How can I do that ?
2024-01-11_10h18_02

PS: the goal of all this is to allow the GDB domains to be recognized automatically, but I don't know if it will work even with the correct types

@sgillies sgillies added this to the 1.10 milestone Mar 5, 2024
sgillies added a commit that referenced this issue Mar 12, 2024
@sgillies
Copy link
Member

@remi-braun I've begun working on this and have 2 questions.

  1. Is not the field width specifier 4 in int32:4 specific to Shapefiles? I'd love to not have to think about this anymore if we don't have to. It's not clear to me that OGR will coerce a 4 char wide OFTInt to OFTInt16 when saving.
  2. What would you think about an "int16" type as in Add an "int16" property type #1358? If your GPKG file has a "short" field, Fiona would report "int16", and if you write a new schema from Fiona with "int16" type, it should manifest in GPKG as a "short".

@remi-braun
Copy link
Author

@sgillies thanks for taking this seriously!

  1. Sadly, I honestly don't know the specifics on this. My only will was to integrate shapefiles into ESRI's GeoDataBases (don't blame me I am forced to 😬) without using arcpy...
    My knowledge on this only comes from the answer I shared in the initial issue 😞
  2. Whatever works for my usecase would be fine for me 😉

sgillies added a commit that referenced this issue Mar 29, 2024
* Add an "int16" property type

Resolves #1321

* Update reference in change log

* Code cleanup to improve readability
sgillies added a commit that referenced this issue Apr 3, 2024
* Add an "int16" property type

Resolves #1321

* Update reference in change log

* Code cleanup to improve readability

* Refactor OGR and Fiona feature builders to replace ugly conditionals

Instead we dispatch to cdef classes kept in several new mappings.

* Refactor OGR and Fiona feature builders to replace ugly conditionals

Instead we dispatch to cdef classes kept in several new mappings.

* xfail a JSON property test for GDAL < 3.5

* xfail another

* New feature builder implementation with cleaner OO logic

* Fix imports and update change log

* Fix detection of tz support in DateTimeField

* Register property setters to OGR field type and value type

To reproduce behavior of Fiona 1.9

* Make lists JSON serializable
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants