Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Error: table_type missing from table parameters when loading table from Hive metastore #1150

Closed
edgarrmondragon opened this issue Sep 9, 2024 · 5 comments · Fixed by #1332

Comments

@edgarrmondragon
Copy link
Contributor

Apache Iceberg version

main (development)

Please describe the bug 🐞

I'm (a user of tap-iceberg is) running into the following error when trying to load a Hive table using pyiceberg.

pyiceberg.exceptions.NoSuchPropertyException: Property table_type missing, could not determine type: bronze.my_iceberg_table

The call in question is https://github.com/shaped-ai/tap-iceberg/blob/38064b3aaca5394ba1482970e790d3e2f6020946/tap_iceberg/tap.py#L94.

It seems the loaded table is missing the table type parameter in

def _get_hive_table(self, open_client: Client, database_name: str, table_name: str) -> HiveTable:
try:
return open_client.get_table(dbname=database_name, tbl_name=table_name)

?

Thanks in advance if this turns out to be user error 😃

@kevinjqliu
Copy link
Contributor

In load_table, there's a 2 step process. First it fetches from HMS using get_table, then it converts the hive table into iceberg (_convert_hive_into_iceberg).

with self._client as open_client:
hive_table = self._get_hive_table(open_client, database_name, table_name)
return self._convert_hive_into_iceberg(hive_table)

The error here is the 2nd step. It is expected that the hive table has a property "table_type" and maps to the string "iceberg".

def _convert_hive_into_iceberg(self, table: HiveTable) -> Table:
properties: Dict[str, str] = table.parameters
if TABLE_TYPE not in properties:
raise NoSuchPropertyException(
f"Property table_type missing, could not determine type: {table.dbName}.{table.tableName}"
)
table_type = properties[TABLE_TYPE]
if table_type.lower() != ICEBERG:
raise NoSuchIcebergTableError(
f"Property table_type is {table_type}, expected {ICEBERG}: {table.dbName}.{table.tableName}"
)

Who created the table in this case? When PyIceberg creates the table, it injects the table_type property

tbl = self._convert_iceberg_into_hive(staged_table)

@edgarrmondragon
Copy link
Contributor Author

Who created the table in this case? When PyIceberg creates the table, it injects the table_type property

I suppose it was created by a third-party and not by HiveCatalog.create_table. Are only tables created by pyiceberg supported here?

@kevinjqliu
Copy link
Contributor

Are only tables created by pyiceberg supported here?

Anyone can create an iceberg table using HMS, which can be read by PyIceberg. In HMS, the assumption is that iceberg tables have a specific property set so that engines can distinguish between hive and iceberg tables.

In this case, the table was created as a "hive table" and not an "iceberg table".

@edgarrmondragon
Copy link
Contributor Author

Anyone can create an iceberg table using HMS, which can be read by PyIceberg. In HMS, the assumption is that iceberg tables have a specific property set so that engines can distinguish between hive and iceberg tables.

In this case, the table was created as a "hive table" and not an "iceberg table".

@kevinjqliu thanks for the info 🙏. Just two more questions:

  1. is there a way to set this property manually?
  2. would doing that break something?

@kevinjqliu
Copy link
Contributor

is there a way to set this property manually?

You can use an engine (like Spark/Trino) to interact with the Hive table to add the extra table parameter. Alternatively, a hacky way is to use hive client in pyiceberg. Like so

with self._client as open_client:
tbl = open_client.get_table(dbname=from_database_name, tbl_name=from_table_name)
tbl.dbName = to_database_name
tbl.tableName = to_table_name
open_client.alter_table(dbname=from_database_name, tbl_name=from_table_name, new_tbl=tbl)

it should work, but definitely test it out first

would doing that break something?

Nope, adding that specific parameter to HMS is how the iceberg table is defined.
You can see example in the core iceberg library and in an engine like Trino

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
2 participants