Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Must have support for batching when writing to Kafka #9

Closed
ryancrawcour opened this issue Mar 20, 2019 · 8 comments
Closed

Must have support for batching when writing to Kafka #9

ryancrawcour opened this issue Mar 20, 2019 · 8 comments
Labels
blocked Needing somebody else to do something somewhere else first feature-request New feature or request

Comments

@ryancrawcour
Copy link
Contributor

When reading from Cosmos DB and writing to Kafka batching should be the default behaviour
Batch size should be defaulted to a chosen value, but be configurable by user.

@ryancrawcour ryancrawcour added this to the P0 milestone Mar 20, 2019
@ryancrawcour
Copy link
Contributor Author

ryancrawcour commented Mar 20, 2019

Config -

batch.size
Batch size between 1 (dedicated PutItemRequest for each record) and 25 (which is the maximum number of items in a BatchWriteItemRequest)

Type: int
Default: 1
Importance: high

@marcelaldecoa marcelaldecoa self-assigned this Apr 10, 2019
@marcelaldecoa
Copy link
Member

The batch size is defined using the property connect.cosmosdb..task.batch.size and it is used to set the MaxItemCount in the Change Feed options. changeFeedOptions.setMaxItemCount(setting.batchSize).

@ryancrawcour
Copy link
Contributor Author

must ensure reading from cosmos db in a batch, and writing to kafka, in a batch, is supported in new Java code.

@ryancrawcour
Copy link
Contributor Author

related #148 and #8

@brandynbrown brandynbrown removed this from the P1 milestone Nov 23, 2020
@ryancrawcour ryancrawcour self-assigned this Dec 3, 2020
@brandynbrown brandynbrown added blocked Needing somebody else to do something somewhere else first refine Issues needing refinement and removed batching labels Dec 7, 2020
@ryancrawcour
Copy link
Contributor Author

Officially by using KafkaProducer and producerRecord you can't do that, but you can do this by configuring some properties in ProducerConfig

batch.size - from document producer batch up the records into requests that are sending to same partition and send them at once

The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This helps performance on both the client and the server. This configuration controls the default batch size in bytes. No attempt will be made to batch records larger than this size.

@ryancrawcour
Copy link
Contributor Author

some more information on producer client and batching -
https://aiokafka.readthedocs.io/en/stable/producer.html

@ryancrawcour ryancrawcour removed the refine Issues needing refinement label Dec 8, 2020
@ryancrawcour
Copy link
Contributor Author

does Cosmos DB support receiving a batch of items from the ChangeFeed at once?
as Marcel says above there is MaxItemCount that can be used, but is that buffering internally, or setting how many items to return from the ChangeFeed for each poll interval?

if we do batching of Cosmos DB messages, what do we do with the checkpoint and watermarks?
eg. Cosmos could send us 5 messages, but we're configured to only write 10 in a batch.
is there any advantage in doing this?

the disadvantage is that if the connector fails on receiving the 6th message, before it has flushed anything to kafka, we will lose the currently buffered 5 messages and cosmos db will think it has already given them to the connector, so won't give them again.

@ryancrawcour
Copy link
Contributor Author

for this first pass we will park batch support and come back to it later.

@ryancrawcour ryancrawcour removed the gold label Dec 8, 2020
@brandynbrown brandynbrown added this to the M3 milestone Jan 7, 2021
@brandynbrown brandynbrown removed this from the M3 milestone Jan 25, 2021
@brandynbrown brandynbrown added feature-request New feature or request e-handoff labels Jan 25, 2021
@brandynbrown brandynbrown added Backlog Handoff wont-fix This will not be worked on and removed blocked Needing somebody else to do something somewhere else first labels Feb 12, 2021
@microsoft microsoft deleted a comment from brandynbrown Feb 18, 2021
@ryancrawcour ryancrawcour added blocked Needing somebody else to do something somewhere else first and removed wont-fix This will not be worked on labels Feb 18, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
blocked Needing somebody else to do something somewhere else first feature-request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants