HashMap in WordCount #4

HarvinderBhullar · 2016-11-14T09:24:11Z

Hi Idris/Sathish

How your wordCount HashMap in the sample code is going to be behave in clustered environment?

Br
Harvinder

godofwharf · 2016-11-15T10:04:47Z

Hi Harvinder,
Word count example in the samples is just a toy example. For running in clustered mode, you need to use a partitioned source. For example, publish sentences to Kafka topic partitioned based on sentence and use KafkaSource instead of RandomSentenceSource in the computation. All the components (source and processors) run within a single JVM only, so the entire computation is scaled horizontally by spawning one more process. The KafkaSource is intelligent enough to balance the partitions between multiple instances.

HarvinderBhullar · 2016-11-15T10:21:43Z

My question was about the target HashMap..Anyways, what you are saying that this hashmap is going to be a KafkaSink in a distributed environment

godofwharf · 2016-11-15T10:29:03Z

Sure, you can rewrite the WordCounter processor to use a distributed cache like Redis or Hazelcast to maintain counts in cluster mode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HashMap in WordCount #4

HashMap in WordCount #4

HarvinderBhullar commented Nov 14, 2016

godofwharf commented Nov 15, 2016

HarvinderBhullar commented Nov 15, 2016

godofwharf commented Nov 15, 2016

HashMap in WordCount #4

HashMap in WordCount #4

Comments

HarvinderBhullar commented Nov 14, 2016

godofwharf commented Nov 15, 2016

HarvinderBhullar commented Nov 15, 2016

godofwharf commented Nov 15, 2016