Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

producer of 1.4.2 version may occupy large memory #1480

Closed
joe-zhan opened this issue Apr 25, 2018 · 7 comments
Closed

producer of 1.4.2 version may occupy large memory #1480

joe-zhan opened this issue Apr 25, 2018 · 7 comments

Comments

@joe-zhan
Copy link

I use 1.4.2 kafka-python to collect my data into kafka, but the memory use of the python process may increase progressively until the ops send a low-memory alarm then I have to restart it.I roll back the version of kafka-python to 1.3.4, then everything is ok.

my env is: centos 7, python 2.7.5

@jeffwidman
Copy link
Contributor

What version of Kafka?

Do you have example code showing how you're producing?

Was the rollback to 1.3.4 the only thing you changed?

@joe-zhan
Copy link
Author

joe-zhan commented Apr 27, 2018

here is my python script, just tail -f the files that filename like 'test_2018042700.txt' then send new lines to kafka. My kafka version is 0.10.2 with scala 2.12.
new_monitor_topic_withmark_v0.10.2.py.txt

I didn't do anything else, only rollback the kafka-python version to 1.3.4,then the memory use stays at 19MB.

@joe-zhan
Copy link
Author

It seems that every second I new a KafkaProducer object then close it after the new lines sended,and the 1.4.2 version doesn't release the memory after I close the producer.I don't know whether this is an issue or not.Maybe just I use the kafka-python in the wrong way.

@Ronniexie
Copy link

I have the same problem
I use 1.4.2 kafka-python.
@namasamitabha Have you solved this problem?

@joe-zhan
Copy link
Author

@Ronniexie Not really.I just rollback the kafka-python to pre version which I used before,and these days I am trying to change the collect script from python to filebeat.

@Ronniexie
Copy link

#1412
Is it related to this problem?

@jeffwidman
Copy link
Contributor

I haven't tested myself, but from the problem description, it sounds like #1412 is a CPU problem, not a memory problem.

If you're instantiating/tearing down a new KafkaProducer instance every time, that is a huge waste of resources, and would certainly use additional memory.

For most scenarios, you should create a single long-lived instance per Python program and then use that single instance to send all messages to Kafka... you can send to multiple topics with the same instance, it's effectively just a database connection. The only exception is if you need separate instances with different configs that are can only be set on instantiation.

I'm going to close this ticket, as it is most likely user error. Happy to re-open if someone has a code snippet showing problems with a single instance.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants