-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Slower producer throughput on 1.4.1 than on 1.3.5 #1412
Comments
This may be caused by the switch to v2 message format (now supported in kafka 1.4), which uses a new CRC implementation (see #1389). @tvoinarovskyi what do you think? |
Here is a program which when run on 1.4.1 goes at 1.5k records per second, whereas on 1.3.5 it goes at 5k. It's on a Ubuntu 16.04, Python 3.5.2. CPU usage by the program is 100% in both cases. The message is 1200 bytes (so it's 1.8M/s and 6M/s for the two versions). import time
from kafka import KafkaProducer
NRECORDS = 50000
FLUSH_EVERY_NRECORDS = 500
kafka_producer = KafkaProducer(
bootstrap_servers='localhost:9092',
value_serializer=lambda s: s.encode(),
)
start_time = time.clock()
for i in range(NRECORDS):
kafka_producer.send('delete_me', 'hello world ' * 100)
if i % FLUSH_EVERY_NRECORDS == 0:
kafka_producer.flush()
end_time = time.clock()
timediff = end_time - start_time
print('{} seconds ({} requests per second)'.format(timediff, NRECORDS/timediff)) |
Yup, definitely a problem with pure python |
Would be good to reference installing the |
Updated README to describe crc32c . With latest 1.4.7 and native crc32c installed, performance is as good as or better than 1.3.5. |
I'm getting quite a bit slower throughput for producers on 1.4.1 than on 1.3.5: about 2.5 MB/s for the former and about 25 MB/s for the latter.
The setup is very simple: one broker (Kafka 1.0.0), one producer, no consumers. The producer records the time, then in a small loop send()s 250 messages, each of which is 40,000 bytes long, to a topic which only has one partition (so: 10 MB total). Afterwards, a flush() is called and the total time and rate are calculated. (This measurement is then repeatedly taken in a loop, but the results seem pretty stable with time) Whether produced locally or from a second computer over a 10 Gbit/s link, this takes about 0.4 sec if I use kafka-python 1.3.4 and 1.3.5 (I haven't tested earlier versions), and about 4 sec if I use kafka-python 1.4.0 or 1.4.1. To switch between these results I only need to uninstall one kafka-python library version and install a different one.
For this, I've set acks=1, compression=none, plaintext. The producer is running under Python 2.7. The config files on the brokers are pretty nearly the standard ones which are distributed with Kafka. The topic has no overrides. This has been tested on two machines running Debian and well as one machine running SL7.
Let me know if there's any further information I can provide. (I'll be the first to admit I'm a bit of a newbie in both the Kafka universe as well as in Python, but I'll do my best) :)
The text was updated successfully, but these errors were encountered: