Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Error writing to a partitioned table: : it is not yet supported to write to hive partitions with datatype Dictionary(UInt16, Utf8) #7891

Closed
alamb opened this issue Oct 20, 2023 · 2 comments · Fixed by #7896

Comments

@alamb
Copy link
Contributor

alamb commented Oct 20, 2023

@suremarc got this error when writing to a partitioned table:

This feature is not implemented: it is not yet supported to write to hive partitions with datatype Dictionary(UInt16, Utf8)

Here is a repro using datafusion-cli:

CREATE EXTERNAL TABLE lz4_raw_compressed_larger
STORED AS PARQUET
PARTITIONED BY (partition)
LOCATION 'data/';

INSERT INTO lz4_raw_compressed_larger VALUES ('non-partition-value', 'partition');

Here's a zip file with a single file in it, data/partition=A/lz4_raw_compressed_larger.parquet.

I noticed the unit tests specify the schema explicitly, but I am guessing if you have DataFusion infer the schema, the partition columns are encoded as dictionaries. I think this will limit the usefulness of this feature if partitioned writes don't work with tables whose schemas are inferred.

Originally posted by @suremarc in #7801 (comment)

@devinjdangelo
Copy link
Contributor

Hm, I am a little confused why Datafusion is inferring the schema of UTF8 data as Dictionary(some int type, UTF8).

🤔 will have to look into it. It does seem that #7891, #7892, and some of the inconveniences reported by @theelderbeever in #7860 are all related.

Perhaps the partitioning code could accept any arrow array type which can be explicitly cast to UTF8, rather than only strictly UTF8... I assume since these Dictionary columns are representing string data, they can be cast to a plain UTF8 array without panic/error.

@devinjdangelo
Copy link
Contributor

Ok, I read the arrow-rs docs on dictionary array types, so I understand what that means now... I took a stab at solving this in #7896

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants