Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

HashMap in WordCount #4

Open
HarvinderBhullar opened this issue Nov 14, 2016 · 3 comments
Open

HashMap in WordCount #4

HarvinderBhullar opened this issue Nov 14, 2016 · 3 comments

Comments

@HarvinderBhullar
Copy link

Hi Idris/Sathish

How your wordCount HashMap in the sample code is going to be behave in clustered environment?

Br
Harvinder

@godofwharf
Copy link

Hi Harvinder,
Word count example in the samples is just a toy example. For running in clustered mode, you need to use a partitioned source. For example, publish sentences to Kafka topic partitioned based on sentence and use KafkaSource instead of RandomSentenceSource in the computation. All the components (source and processors) run within a single JVM only, so the entire computation is scaled horizontally by spawning one more process. The KafkaSource is intelligent enough to balance the partitions between multiple instances.

@HarvinderBhullar
Copy link
Author

My question was about the target HashMap..Anyways, what you are saying that this hashmap is going to be a KafkaSink in a distributed environment

@godofwharf
Copy link

Sure, you can rewrite the WordCounter processor to use a distributed cache like Redis or Hazelcast to maintain counts in cluster mode.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants