This repository was archived by the owner on Oct 25, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
461 lines (428 loc) · 17.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Massive Datenströme mit Kafka</title>
<meta name="description" content="data2day 2015 - Massive Datenströme mit Kafka">
<meta name="author" content="Frank Wisniewski & Lars Pfannenschmidt">
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no, minimal-ui">
<link rel="stylesheet" href="css/reveal.css">
<link rel="stylesheet" href="css/theme/simple.css" id="theme">
<!-- Code syntax highlighting -->
<link rel="stylesheet" href="lib/css/zenburn.css">
<!-- Printing and PDF exports -->
<script>
var link = document.createElement( 'link' );
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = window.location.search.match( /print-pdf/gi ) ? 'css/print/pdf.css' : 'css/print/paper.css';
document.getElementsByTagName( 'head' )[0].appendChild( link );
</script>
<!--[if lt IE 9]>
<script src="lib/js/html5shiv.js"></script>
<![endif]-->
</head>
<body>
<div class="reveal">
<!-- Any section element inside of this container is displayed as a slide -->
<div class="slides">
<section>
<small><h3>data2day 2015</h3></small>
<h1>Massive Datenströme mit Kafka</h1>
<p>
<small>
<a href="https://github.com/frankwis">Frank Wisniewski</a> - <a href="http://twitter.com/ultraknackig">@ultraknackig</a><br/>
<a href="https://github.com/larsp">Lars Pfannenschmidt</a> - <a href="http://twitter.com/leastangle">@leastangle</a>
</small>
</p>
<aside class="notes">
RabbitMQ? ActiveMQ? Verbrechen wie JMS? Und Kafka...?<br/>
<b>verteilter, partitionierender, replizierender</b> Service für Datenströme.
</aside>
</section>
<section>
<h2>$ whoami</h2>
<h4>Frank Wisniewski</h4>
<ul>
<li>Futsal playing Engineer @Intuit</li>
<li>Event-Driven & Real Time Applications</li>
<li><a href="https://twitter.com/ultraknackig">@ultraknackig</a></li>
</ul>
<p/>
<h4>Lars Pfannenschmidt</h4>
<ul>
<li>Real Time Data Products @Intuit</li>
<li>Founder of <a href="https://twitter.com/mobilecgn">@mobilecgn</a> User Group</li>
<li><a href="https://twitter.com/leastangle">@leastangle</a></li>
</ul>
</section>
<section>
<h2>$ ls -al</h2>
<ul>
<li class="fragment">Motivation</li>
<div class="fragment">
<li>Concepts</li>
<ul>
<li>Topics & Partitions</li>
<li>Offset Management</li>
<li>Log Compaction</li>
</ul>
</div>
<div class="fragment">
<li>Example</li>
<ul>
<li>Producer</li>
<li>Consumer</li>
</ul>
</div>
<li class="fragment">Performance</li>
<li class="fragment">Summary</li>
</ul>
</section>
<section>
<section data-background="img/motivation.jpg">
<h1>Motivation</h1>
<small><a href="https://flic.kr/p/2k2Ww3">high wire 2</a> by Graeme Maclean - Some rights reserved <a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a></small>
<aside class="notes">
interns Projekt LinkedIn; einheitliche Plattform zur Verarbeitung von großen Datenströmen;
</aside>
</section>
<section>
<h2>Chaos</h2>
<img src="img/datapipeline_complex.jpg"/>
<small><a href="http://linkd.in/199iMwY">„The Log: What every software engineer should know about realtime data's unifying abstraction“</a> by Jay Kreps</small>
<aside class="notes">
uni-/bidirektionale Abhaenigkeiten (Daten) kreuz und quer; Unwartbar
</aside>
</section>
<section>
<h2>Order</h2>
<img src="img/datapipeline_simple.jpg"/>
<small><a href="http://linkd.in/199iMwY">„The Log: What every software engineer should know about realtime data's unifying abstraction“</a> by Jay Kreps</small>
<aside class="notes">
entkoppeltes Nachrichtensystem; klassisches funktionierte nicht; wurde LinkedIn's “zentralen Nervensystem”<br/>
Kernanforderungen: <b>Latenz, Skalierbarkeit, Fehlertoleranz & Verfügbarkeit</b>
</aside>
</section>
</section>
<section>
<section data-background="img/concepts.jpg">
<h1>Concepts</h1>
<small><a href="https://flic.kr/p/iRs3gB">1960 Lloyd Arabella</a> by JOHN LLOYD - Some rights reserved <a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a></small>
</section>
<section>
<h2>Overview</h2>
<img width="65%" src="img/overview.png"/>
<small>Topological Overview according to <a href="http://kafka.apache.org/documentation.html">Kafka documentation</a></small>
</section>
<section>
<h2>Topics & Partitions</h2>
<img src="img/topic.png"/>
<small>Anatomy of a topic according to <a href="http://kafka.apache.org/documentation.html">Kafka documentation</a></small>
<aside class="notes">
Nachrichten → Topic; <br/>durable;<br/> n Partitionen und Knoten verteilt;<br/> sequentiell;<br/> Offset pro partition;<br/> Replikation; Retention;<br/>
<b>Obacht:</b> Partitionen Nachträgliches Ändern :(
</aside>
</section>
<section>
<h2>Guarantees</h2>
<ul>
<li>Messages sent to a partition will be appended in the order they are seen</li>
<li>Consumer see messages in the order they are stored in the log</li>
<li>Kafka will tolerate up to <code>n-1</code> server failures for a topic with replication factor <code>n</code></li>
</ul>
</section>
<section>
<h2>Log Compaction</h2>
<img width="65%" src="img/log_compaction.png"/>
<small>Log compaction according to <a href="http://kafka.apache.org/documentation.html">Kafka documentation</a></small>
<aside class="notes">
Compression nicht möglich
</aside>
</section>
</section>
<section>
<section data-background="img/examples.jpg">
<h1>Example</h1>
<small><a href="https://flic.kr/p/mHwYY">Hello World</a> by Bill Bradford - Some rights reserved <a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a></small>
</section>
<section>
<h2>Java Producer</h2>
<pre class="fragment"><code data-trim class="java">
public class News {
public final UUID id;
public final String author, title, body;
...
}
</code></pre>
<pre class="fragment"><code data-trim class="java">
Properties config = new Properties();
config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, broker);
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, NewsSerializer.class.getName());
KafkaProducer<String, News> producer = new KafkaProducer<>(config);
</code></pre>
<pre class="fragment"><code data-trim class="java">
public RecordMetadata send(News news) throws ExecutionException, InterruptedException {
ProducerRecord<String, News> record = new ProducerRecord<>(topic, news.id.toString(), news);
Future<RecordMetadata> recordMetadataFuture = this.producer.send(record);
return recordMetadataFuture.get();
}
</code></pre>
<aside class="notes">
SourceCode zu klein? Folien bereitgestellt!<br>
Kafka → byte[]<br> Domäne benötigt eigene <b>De-/Serializer</b>
</aside>
</section>
<section>
<h2>Message Distribution</h2>
Message routing to target partition via <code>ProducerRecord</code><br/><br/>
<ul>
<li class="fragment">
Round Robin
<pre><code data-trim class="java">
ProducerRecord(String topic, V value);
</code></pre>
</li>
<li class="fragment">
Via <code>key</code> hash (murmur2)
<pre><code data-trim class="java">
ProducerRecord(String topic, K key, V value);
</code></pre>
</li>
<li class="fragment" style="width:110%">
Via specific partition
<pre><code data-trim class="java">
ProducerRecord(String topic, Integer partition, K key, V value);
</code></pre>
</li>
</ul>
<aside class="notes">
<b>Erinnerung:</b> Reihenfolge pro Partititon garantiert;<br> LogCompaction ohne Key?
</aside>
</section>
<section>
<h2>Consumer Grouping</h2>
<img src="img/consumer_grouping.png"/>
<small>“Kafka 101 - Massive Datenströme mit Apache Kafka” by Pfannenschmidt and Wisniewski, JavaMagazin 08.2015, p. 38</small>
<aside class="notes">
Strategien zum konsumieren: <b>links</b> Queue; <b>rechts</b> Topic/publish-subscribe
<ul>
<li><b>Topic A:</b>Nachrichten <i>entweder</i> von A0 <i>oder</i> A1</li>
<li><b>Topic B:</b>Nachrichten <i>sowohl</i> von B0 <i>als auch</i> von einem Consumer aus Gruppe C empfangen</li>
</ul>
</aside>
</section>
<section>
<h2>Consumer Threading</h2>
<img width="65%" src="img/consumer_threading.png"/>
<small>“Kafka 101 - Massive Datenströme mit Apache Kafka” by Pfannenschmidt and Wisniewski, JavaMagazin 08.2015, p. 41</small>
<aside class="notes">
<ul>
<li>#Threads > #Paritions → überschüssigen Threads erhalten keine Nachrichten (siehe T0)</li>
<li>#Threads < #Paritions → Threads bekommen Nachrichten von mehreren Partitionen → <b>keine Order mehr!!</b></li>
</ul>
</aside>
</section>
<section>
<h2>Java Consumer Thread</h2>
<pre class="fragment"><code data-trim class="java">
public void run() {
Thread.currentThread().setName(name);
ConsumerIterator<String, News> it = messageStream.iterator();
while (it.hasNext()) {
relayMessage(it.next());
}
}
</code></pre>
<pre class="fragment"><code data-trim class="java">
void relayMessage(MessageAndMetadata<String, News> kafkaMessage) {
logger.trace("Received message with key '{}' and offset '{}' "
+ "on partition '{}' for topic '{}'",
kafkaMessage.key(), kafkaMessage.offset(),
kafkaMessage.partition(), kafkaMessage.topic());
messageConsumer.consume(kafkaMessage.message());
}
</code></pre>
<pre class="fragment"><code data-trim class="java">
public interface NewsConsumer<News> {
void consume(News message);
}
</code></pre>
<aside class="notes">
</aside>
</section>
<section>
<h2>Wiring</h2>
<pre class="fragment"><code data-trim class="java">
import static kafka.consumer.Consumer.createJavaConsumerConnector;
Properties props = new Properties();
props.put("zookeeper.connect", zookeeper); // list of ZooKeeper nodes
props.put("group.id", groupId); // identifies consumer group
props.put("offsets.storage", "kafka"); // storage for offsets
props.put("dual.commit.enabled", "false"); // migration switch
...
ConsumerConnector consumerConnector = createJavaConsumerConnector(new ConsumerConfig(props));
</code></pre>
<pre class="fragment"><code data-trim class="java">
Map<String, List<KafkaStream<String, News>>> consumerMap;
consumerMap = consumerConnector.createMessageStreams(
ImmutableMap.of(topic, numberOfThreads), // number of streams per topic
new StringDecoder(null), new NewsDecoder()); // message key and value decoders
List<KafkaStream<String, News>> streams = consumerMap.get(topic);
</code></pre>
<pre class="fragment"><code data-trim class="java">
// create fixed size thread pool to launch all the threads
ExecutorService pool = Executors.newFixedThreadPool(numberOfThreads);
// create consumer threads to handle the messages
for (final KafkaStream stream : streams) {
String name = String.format("%s[%s]", topic, threadNumber++);
pool.submit(new ConsumerThread(stream, name, consumer));
}
</code></pre>
<small class="fragment">Full Code Example: <a href="https://github.com/kafka101/java-news-feed">https://github.com/kafka101/java-news-feed</a></small>
<aside class="notes">
<ul>
<li><b>High Level Consumer API</b> lagert das Verwalten des Offset nach ZooKeeper oder Kafka aus</li>
<li><b>SimpleConsumer</b> überlässt dies der Consumer-Implementierung</li>
<li>Client Re-Design ist in Arbeit</li>
</ul>
Gott sei Dank ist eine neue Java-API in Arbeit!
</aside>
</section>
</section>
<section>
<section data-background="img/performance.jpg">
<h1>Performance</h1>
<small><a href="https://flic.kr/p/wQgqtx">Grefsenkollen Downhill 2015</a> by Vegar Nilsen - Some rights reserved <a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a></small>
</section>
<section>
<h2>What makes Kafka fast?</h2>
<blockquote cite="http://kafka.apache.org/documentation.html#maximizingefficiency">"[Designed] to make consumption as cheap as possible"</blockquote>
<aside class="notes">
Teure Probleme beim Messaging:<br>
Exactly once → At least once<br>
Order → Partitions
</aside>
</section>
<section>
<h2>What makes Kafka fast?</h2>
<div class="fragment">
<h4>Fast Writes:</h4>
<ul>
<li>linear writes</li>
<li>all writes go to OS pagecache</li>
</ul>
<p/>
</div>
<div class="fragment">
<h4>Fast Reads:</h4>
<ul>
<li>linear reads</li>
<li>direct data transport from <br>pagecache to socket* via <code><a href="http://man7.org/linux/man-pages/man2/sendfile.2.html" target="blank">sendfile()</a></code></li>
</ul>
<br><br>
</div>
<div class="fragment">
<h3>Combined → Fast!</h3>
<p>
<small>* Consumer without lag get messages from pagecache!<br/>
More Details: <a href="http://kafka.apache.org/documentation.html#persistence" target="_blank">http://kafka.apache.org/documentation.html#persistence</a>
</small>
</p>
</div>
<aside class="notes">
Durable queues bei den klassikern "langsam"<br>
linear writes outperform random writes by far!<br>
OS optimiert: read-ahead/write-behind<br>
ACM Artikel: "random access to memory is typically slower than sequential access to disk!"
</aside>
</section>
<section>
<h2>sendfile()?</h2>
<img src="img/traditional data copy vs sendfile.png"/>
<small>Efficient data transfer according to <a href="http://www.ibm.com/developerworks/library/j-zerocopy/">http://www.ibm.com/developerworks/library/j-zerocopy/</a></small>
<aside class="notes">
links ohne, rechts mit <code>sendfile()</code><br>
<code>FileChannel.transferTo()</code>, falls JVM/OS unterstützt → sendfile().
</aside>
</section>
</section>
</section>
<section>
<section data-background="img/summary.jpg">
<h1>Summary</h1>
<small><a href="https://flic.kr/p/ifz6wb">Numbers And Finance</a> by reynermedia - Some rights reserved <a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a></small>
</section>
<section>
<h2>Key Features & Facts</h2>
<ul>
<li class="fragment"><b>Commit Log</b>  Topics, Partitioning, Log Compaction, Compression & Offset Management</li>
<li class="fragment"><b>Real-time</b>      High-Throughput & Low-Latency</li>
<li class="fragment"><b>Persistence</b>   Scalable, centralized & replicated storage</li>
</ul>
</section>
<section>
<h2>Related</h2>
<ul>
<li class="fragment"><b>Clients</b>  Scala, Java, Python, Go, Perl, Erlang etc.</li>
<li class="fragment"><b>Integration</b>  Storm, Samza, Hadoop, ElasticSearch, Logstash, Hive etc.</li>
<li class="fragment"><b>Confluent Platform</b>  Schema Registry, REST Proxy & Camus</li>
</ul>
</section>
</section>
<section>
<h1>Thank You!</h1>
<p>
<a href="http://datanerds.io">datanerds.io</a><br/>
<a href="https://github.com/kafka101">github.com/kafka101</a>
</p>
<p>
<small>
<a href="https://github.com/frankwis">Frank Wisniewski</a> - <a href="http://twitter.com/ultraknackig">@ultraknackig</a><br/>
<a href="https://github.com/larsp">Lars Pfannenschmidt</a> - <a href="http://twitter.com/leastangle">@leastangle</a>
</small>
</p>
</section>
</div>
</div>
<script src="lib/js/head.min.js"></script>
<script src="js/reveal.js"></script>
<script>
// Full list of configuration options available at:
// https://github.com/hakimel/reveal.js#configuration
Reveal.initialize({
controls: true,
progress: true,
history: true,
center: true,
width: 1024,
height: 768,
margin: 0.1,
minScale: 0.8,
maxScale: 1.2,
slideNumber: 'c / t',
transition: 'slide', // none/fade/slide/convex/concave/zoom
// Optional reveal.js plugins
dependencies: [
{ src: 'lib/js/classList.js', condition: function() { return !document.body.classList; } },
{ src: 'plugin/markdown/marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: 'plugin/markdown/markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } },
{ src: 'plugin/highlight/highlight.js', async: true, condition: function() { return !!document.querySelector( 'pre code' ); }, callback: function() { hljs.initHighlightingOnLoad(); } },
{ src: 'plugin/zoom-js/zoom.js', async: true },
{ src: 'plugin/notes/notes.js', async: true }
]
});
</script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-67830207-1', 'auto');
ga('send', 'pageview');
</script>
</body>
</html>