Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Allow disabling of best effort de-duplication #272

Open
adelcast opened this issue May 19, 2020 · 0 comments
Open

Allow disabling of best effort de-duplication #272

adelcast opened this issue May 19, 2020 · 0 comments

Comments

@adelcast
Copy link

The connector is currently adding an InsertId per row, which is used by BigQuery to dedupe rows that have the same insertId (in a 1 minute window). Using insertIds throttles the ingestion rate to a maximum of 100k rows per second & 100 MB/s.

Insertions without a insertId disable best effort de-duplication [1], which increases the ingestion quota to a maximum of 1 GB/s. For high throughput applications, its desirable to disable dedupe, handling duplication on the query side.

A config option to disable best effort de-duplication (removing insertIds) would be a great knob to have, which could improve ingestion by 10x.

[1] https://cloud.google.com/bigquery/streaming-data-into-bigquery#disabling_best_effort_de-duplication

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant