Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

When there are 2 or more records with same hoodie key in a single parquet file, only one of the records gets updated in the Upsert flow #333

Closed
suniluber opened this issue Mar 5, 2018 · 4 comments

Comments

@suniluber
Copy link
Contributor

suniluber commented Mar 5, 2018

There may be situations where there are multiple records with same hoodie key in a single parquet file. Let's assume a scenario where in we have 3 parquet files, and all the three parquet files have a record with same hoodie key and 1 of the three files have multiple records with same hoodie key. When a new record with same hoodie key is upserted, updates happen to both parquet files having 1 record and only 1 record gets updated in the 3rd file having multiple records.

@ovj @vinothchandar @jianxu @n3nash

@suniluber suniluber changed the title When there are 2 or more records with same Hadoop_Rowkey and Hadoop_timestamp in a single parquet file, only one of the records gets updated in the Upsert flow When there are 2 or more records with same hoodie key in a single parquet file, only one of the records gets updated in the Upsert flow Mar 5, 2018
@vinothchandar
Copy link
Member

we discussed this f2f.. FWIW, this is the correct and expected behavior.. we don't expect a key to be present multiple times in the partition ..

@vinothchandar
Copy link
Member

who's picking this up.

@suniluber
Copy link
Contributor Author

i have made the update to the code and added tests. will send a pull request soon.

@vinothchandar
Copy link
Member

Closing this since PR has been inactive for a while. please reopen if needed

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants