The Hudi extensions provide the ability to add field IDs to the parquet schema when writing with Hudi. This is a requirement for some engines, like BigQuery and Snowflake, when reading an Iceberg table. If you are not planning on using Iceberg, then you do not need to add these to your Hudi writers.
- Add the extensions jar (
xtable-hudi-extensions-0.2.0-SNAPSHOT-bundled.jar
) to your class path.
For example, if you're using the Hudi quick-start guide for spark you can just add--jars xtable-hudi-extensions-0.2.0-SNAPSHOT-bundled.jar
to the end of the command. - Set the following configurations in your writer options:
hoodie.avro.write.support.class: org.apache.xtable.hudi.extensions.HoodieAvroWriteSupportWithFieldIds
hoodie.client.init.callback.classes: org.apache.xtable.hudi.extensions.AddFieldIdsClientInitCallback
hoodie.datasource.write.row.writer.enable : false
(RowWriter support is coming soon) - Run your existing code that use Hudi writers
If you want to use XTable with Hudi streaming ingestion to sync each commit into other table formats.
- Add the extensions jar (
xtable-hudi-extensions-0.2.0-SNAPSHOT-bundled.jar
) to your class path. - Add
org.apache.xtable.hudi.sync.XTableSyncTool
to your list of sync classes - Set the following configurations based on your preferences:
hoodie.xtable.formats.to.sync: "ICEBERG,DELTA"
(or simply use one format)hoodie.xtable.target.metadata.retention.hr: 168
(default retention for target format metadata is 168 hours)