Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Bug]: Writing partitioned data with Managed IcebergIO fails #31943

Closed
2 of 16 tasks
arthurpessoa opened this issue Jul 22, 2024 · 2 comments · Fixed by #32102
Closed
2 of 16 tasks

[Bug]: Writing partitioned data with Managed IcebergIO fails #31943

arthurpessoa opened this issue Jul 22, 2024 · 2 comments · Fixed by #32102

Comments

@arthurpessoa
Copy link

arthurpessoa commented Jul 22, 2024

What happened?

After following Dataflow example (https://cloud.google.com/dataflow/docs/guides/write-to-iceberg), i'm trying to add a partition, but it breaks when writing.

Given the following table:

    val catalogConfig = IcebergCatalogConfig.builder()
        .setIcebergCatalogType("hadoop")
        .setName("hadoop.catalog")
        .setWarehouseLocation("file:///warehouse")
        .build()


    val icebergSchema = IcebergSchema(
        Types.NestedField.required(1, "id", Types.LongType.get()),
        Types.NestedField.required(2, "name", Types.StringType.get()),
        Types.NestedField.required(3, "day", Types.StringType.get())
    )

    val partitionSpec = PartitionSpec.builderFor(icebergSchema)
        .identity("id")
        .build()

    catalogConfig.catalog().createTable(
        TableIdentifier.of("table"),
        icebergSchema,
        partitionSpec
    )

When writing with the following code

    val TABLE_ROWS: List<String> = listOf(
        "{\"id\":0, \"name\":\"Alice\", \"day\":\"01\"}",
        "{\"id\":1, \"name\":\"Bob\", \"day\":\"02\"}",
        "{\"id\":2, \"name\":\"Charles\", \"day\":\"02\"}"
    )

    val SCHEMA: Schema = Schema.builder()
        .addInt64Field("id")
        .addStringField("name")
        .addStringField("day")
        .build()


    pipeline.apply(Create.of(TABLE_ROWS))
        .apply(JsonToRow.withSchema(SCHEMA))
        .apply(
           Managed.write(Managed.ICEBERG).withConfig(mapOf("table" to "table",
           "catalog_config" to mapOf(
               "catalog_name" to "hadoop.catalog",
               "warehouse_location" to "file:///warehouse",
               "catalog_type" to "hadoop"
           )
        )))

Then it breaks with the following stacktrace:

Caused by: java.lang.IllegalArgumentException: Partition must not be null when creating data writer for partitioned spec
	at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
	at org.apache.iceberg.parquet.Parquet$DataWriteBuilder.build(Parquet.java:704)
	at org.apache.beam.sdk.io.iceberg.RecordWriter.<init>(RecordWriter.java:73)
	at org.apache.beam.sdk.io.iceberg.RecordWriter.<init>(RecordWriter.java:47)
	at org.apache.beam.sdk.io.iceberg.WriteUngroupedRowsToFiles$WriteUngroupedRowsToFilesDoFn.createAndInsertWriter(WriteUngroupedRowsToFiles.java:219)
	at org.apache.beam.sdk.io.iceberg.WriteUngroupedRowsToFiles$WriteUngroupedRowsToFilesDoFn.getWriterIfPossible(WriteUngroupedRowsToFiles.java:242)
	at org.apache.beam.sdk.io.iceberg.WriteUngroupedRowsToFiles$WriteUngroupedRowsToFilesDoFn.processElement(WriteUngroupedRowsToFiles.java:258)

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@chamikaramj
Copy link
Contributor

cc: @ahmedabu98

@ahmedabu98
Copy link
Contributor

Taking a look.. thanks for the repro @arthurpessoa

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants