Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

validate Arrow types, not just Julia types, for Onda.jl-written columns #113

Open
jrevels opened this issue Jan 19, 2022 · 0 comments
Open

Comments

@jrevels
Copy link
Member

jrevels commented Jan 19, 2022

For example, to be specification-compliant (and thus easily portable to non-Julia systems as intended) file_path should be stored in Arrow as Utf8 and correspond to valid URI or relative file path. Of course, this still allows the application layer to use Arrow's custom metadata for application-layer type conversion (e.g. like Arrow.jl does).

However, Legolas/Onda does not enforce the Arrow type, only the Julia type, which Onda purposefully leaves unrestricted (e.g. file_path::Any) in order to support generic file path types.

This can cause issues for users, though, if their path type does not define the proper conversion to Arrow's Utf8. For example, S3Paths currently exhibit this problem (ref JuliaCloud/AWSS3.jl#184).

It'd be good to add some sort of validation to check that Onda.jl-written columns contain the expected Arrow types.

Unfortunately Legolas doesn't provide a way to enforce the Arrow type, though maybe it should? The right approach here might be to add this kind of feature to Legolas, then just make use of it here

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant