Skip to content

Instantly share code, notes, and snippets.

@thealmightygrant
Last active February 13, 2018 20:23
Show Gist options
  • Save thealmightygrant/a9a4c63bd0d836c22ca38d8e3679c431 to your computer and use it in GitHub Desktop.
Save thealmightygrant/a9a4c63bd0d836c22ca38d8e3679c431 to your computer and use it in GitHub Desktop.
Deep in the Heart of Kafka
<section>
<h1>Deep in the Heart of Kafka</h1>
<h2>Grant Sherrick</h2>
</section>
<section id="what-has-confused-me">
<h2>I have found Kafka to be pretty confusing...</h2>
<ul class="fragment">
<li>Kafka/Confluent Relationship</li>
<li>Zookeeper</li>
<li>Consumer Groups</li>
<li>Offset Storage</li>
</ul>
</section>
<section id="kafka-platform">
<h2>What is the Kafka Open Source Project?</h2>
<ul class="fragment">
<li>Kafka Brokers</li>
<li>Kafka Java Client APIs</li>
<ul>
<li>Producer (application => topic)</li>
<li>Consumer (topic => application)</li>
<li>Streams (process as streams [in or out])</li>
<li>Connect (topic => external API, external API => Kafka)</li>
</ul>
</li>
</ul>
</section>
<section id="confluent-platform">
<h2>What is the Confluent Platform?</h2>
<ul class="fragment">
<li>Kafka Rest Proxy</li>
<li>Kafka Schema Registry</li>
<li>Kafka Connectors
<ul>
<li>HDFS, S3, ElasticSearch</li>
<li>JDBC [Kafka => SQL DBs and vice versa])</li>
</ul>
</li>
<li>Kafka Clients
<ul>
<li>C/C++, Python, Golang, .Net</li>
</ul>
</li>
</ul>
</section>
<section id="versioning">
<h2>What version of Kafka are you running?</h2>
<p class="fragment">How does this relate to your version of Kafka Connect, Kafka Schema Registry, and Kafka Rest Proxy?</p>
<h1 class="fragment">TO THE DOCS!!</h1>
<p class="fragment">docs.confluent.io => Release Notes => <a href="https://docs.confluent.io/current/release-notes.html#confluent-platform-3-2-2-release-notes">Confluent Platform 3.2.2 Release Notes</a></p>
</section>
<section id="zookeeper">
<h2>What is stored in Zookeeper?</h2>
<ol>
<li class="fragment">Cluster membership
<ul><li>Which brokers are alive and part of the cluster?</li></ul>
</li>
<li class="fragment">Topic configuration
<ul><li>which topics exist? Partitions? Replicas?</li></ul>
<pre><code style="max-height: 150px;">$ kafka-topics --list --zookeeper kafka-01/kafka
__confluent.support.metrics
__consumer_offsets
_schemas
ac-user-event
connect-configs
connect-offsets
connect-status
gainsight-poster-streaming...
health-metrics</pre></code>
</li>
<li class="fragment">Electing a Controller
<ul><li>The controller is one of the brokers and is responsible for partition management.</li></ul>
</li>
</ol>
</section>
<section id="consumer-groups-and-partitioning1">
<h2>How do partitions on a topic relate to consumer groups?</h2>
<pre style="width: 100%;"><code style="max-height: 180px; max-width: 900px;">kafka-consumer-groups --bootstrap-server kafka-01:9092 --group connect-lts-dev-ac-user-event-kc-job --describe
Note: This will only show information about consumers that use the Java consumer API (non-ZooKeeper-based consumers).
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
ac-user-event 8 - 75616 - consumer-6-0f79da4a-b998-4af6-81c8-3c7b6f8b5a89 /198.18.32.53 consumer-6
ac-user-event 10 75494 75615 121 consumer-8-ec9b0249-873c-4b46-93e1-d2b4f0582a6e /198.18.32.53 consumer-8
ac-user-event 1 75494 75615 121 consumer-11-1fe0f703-5735-458f-8a68-735da598a552 /198.18.32.53 consumer-11
ac-user-event 0 - 75616 - consumer-10-ee420d19-597b-4e9f-8196-0679d47d4b19 /198.18.32.53 consumer-10
ac-user-event 5 - 75619 - consumer-15-d229ce83-5681-447f-8bcd-71ca79d86fcb /198.18.32.53 consumer-15
ac-user-event 2 - 75615 - consumer-12-e815df04-f997-464b-a9af-62b7b9e2433c /198.18.32.53 consumer-12
ac-user-event 6 - 75616 - consumer-4-b111d140-adc1-4686-9182-04c961965f61 /198.18.32.53 consumer-4
ac-user-event 11 - 75615 - consumer-9-3f6c5273-1794-4b08-a54e-d8e17bbdc76c /198.18.32.53 consumer-9
ac-user-event 7 75496 75617 121 consumer-5-c4d3ae65-b0bb-4cbd-b0f4-96e615d3d16f /198.18.32.53 consumer-5
ac-user-event 3 - 75615 - consumer-13-3e687067-40a0-4ba3-a0ff-9fedd5a6dd47 /198.18.32.53 consumer-13
ac-user-event 9 - 75615 - consumer-7-4c6e4d9a-f035-444e-9fdc-16a267ebe7cc /198.18.32.53 consumer-7
ac-user-event 4 75497 75617 120 consumer-14-bf24ff91-41f4-4e7a-b008-769e2456e57f /198.18.32.53 consumer-14</pre></code>
<ul>
<li class="fragment">Kafka does not guarantee order across partitions (i.e. only messages within a partition are in order).</li>
<li class="fragment">1 or more partitions per consumer group.</li>
<li class="fragment">Consumers in a consumer group share ownership of the partitions in the topics they subscribe to.</li>
<li class="fragment">Not enough partitions means there will be idle consumers within the group.</li>
<li class="fragment"><a href="https://www.safaribooksonline.com/library/view/kafka-the-definitive/9781491936153/ch04.html">More info and examples</a></li>
</ul>
</section>
<section id="consumer-groups-and-partitioning2">
<h2>How do partitions get assigned to consumers within a group?</h2>
<ul>
<li class="fragment">Each partition is assigned one owner in a rebalancing phase.</li>
<li class="fragment">When a consumer dies, within a consumer group, all of the consumers that are still alive:
<ul>
<li>Stop work</li>
<li>Unsubscribe from their partitions</li>
<li>Request to rejoin.</li>
</ul>
</li>
<li class="fragment">The "dead" consumers will not unsubscribe from their partitions.</li>
</ul>
</section>
<section id="consumer-groups-and-partitioning3">
<h2>What happens to the partitions of "dead" consumers?</h2>
<ul class="fragment">
<li>Consumer timeout is computed from:
<ul>
<li class="fragment">The kafka client times out via configuration: <pre><code>session.timeout.ms</code></pre></li>
<li class="fragment">The timeout for a rebalancing equals: <pre><code>rebalance.backoff.ms * rebalance.max.retries</code></pre></li>
<li class="fragment">No data is read from Kafka during this time.</li>
<li class="fragment">Too small of a Kafka client session timeout can result in lag caused by unnecessary rebalancing.</li>
</ul>
</li>
</section>
<section id="consumer-group-offsets-topic1">
<h2>Where are the offsets for a topic stored?</h2>
<ul>
<li class="fragment">Pre 0.8 Kafka, stored on zookeeper...</li>
<li class="fragment">Post 0.8 Kafka, store on the <code>__consumer_offsets</code> topic</li>
</ul>
</section>
<section id="consumer-group-offsets-topic2">
<h2>How does the Consumer Offsets topic work?</h2>
<ul>
<li class="fragment">The consumer group id is used as the hash to choose a partition.</li>
<li class="fragment">The leader of the partition is set to to be the consumer group's coordinator.</li>
<li class="fragment">The coordinator maintains offsets for this consumer group, for all partitions of the topic.</li>
<li class="fragment">Consumer offsets expire...</li>
</ul>
</section>
<section id="conclusions">
<ul>
<li class="fragment">Kafka = OSS, Confluent = Company that makes many nice Kafka products</li>
<li class="fragment">Zookeeper does very little (only used for high lvl coordination of brokers)</li>
<li class="fragment">Partitions and Consumer Groups are closely tied together...</li>
</ul>
</section>
<section>
<h2>Thanks!</h2>
</section>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment