Skip to content

Instantly share code, notes, and snippets.

@rsrini7
Created April 1, 2025 14:42
Show Gist options
  • Save rsrini7/d33eaa8b5f9f6d74122e21e2cc489879 to your computer and use it in GitHub Desktop.
Save rsrini7/d33eaa8b5f9f6d74122e21e2cc489879 to your computer and use it in GitHub Desktop.
kafka 4.0 features
**Apache Kafka 4.0 Details:**
* **Cloud-Native Focus:** The primary theme of Kafka 4.0 is its continued evolution towards being more cloud-native (from video).
* **ZooKeeper Removal (KIP-500):**
* ZooKeeper has been fully removed and deprecated (from video)
* KRaft is now the standard for metadata management, offering improved scalability and resilience (from video).
* This change simplifies cloud deployments by eliminating a component that posed networking challenges (from video).
* **KIP-848: New Consumer Rebalance Protocol**
* "Lightweight Clients, Stable and Rapid Rebalances": Experience significantly faster and more stable consumer group rebalances[cite: 1].
* **Scalability and Efficiency:**
* Broker-driven rebalancing: Shifts coordination from clients to brokers, reducing client overhead[cite: 2].
* Multi-threaded broker processing: Parallelizes rebalances, handling large groups efficiently[cite: 2, 3].
* Improved Heartbeat API: Provides more efficient communication between clients and brokers with less network traffic[cite: 3].
* **Simplified Management:** Server-side configuration of heartbeats, timeouts, and assignors streamlines consumer setup[cite: 4].
* **Optimized Rebalancing:** Incremental, multi-threaded rebalancing allows for faster partition assignment for new consumers[cite: 5].
* **Server Enabled, Client Opt-In:**
* Enabled by default on Kafka 4.0 servers[cite: 6].
* Clients activate with group.protocol=consumer[cite: 6].
* **Enhanced Monitoring:** New metrics provide deeper insights into rebalance performance[cite: 7].
* This protocol shifts rebalance functionality from the client to the broker (from video)
* This is beneficial for cloud deployments, providing better control over consumer groups and rebalances (from video).
* Enhanced metrics are included for better observability during rebalances (from video).
* **KIP-932: Queues for Kafka (early access)**
* **What it is:**
* Introduces a new way to consume messages in Kafka, enabling cooperative consumption through "share groups"[cite: 8].
* Decouples consumers from partitions, so multiple consumers can read the same partition[cite: 9].
* Designed for queuing use cases, where higher throughput and parallel processing are desired[cite: 9].
* **New Features:**
* KafkaShareConsumer: A new client interface for interacting with share groups[cite: 10].
* Kafka CLI:
* kafka-console-share-consumer.sh: For consuming data using a share group[cite: 10].
* kafka-share-groups.sh: For managing share groups[cite: 11].
* AdminClient: Includes new methods for managing share groups[cite: 11].
* Metrics: Share groups have their own metrics (e.g., consumer lag) for monitoring and scaling[cite: 12].
* New APIs: Introduces ShareGroupHeartbeat, ShareGroupDescribe, ShareFetch, and ShareAcknowledge to support share groups[cite: 13].
* The goal of this release is to test the new share group functionality[cite: 14].
* **NOT FOR PRODUCTION USE**[cite: 15].
* Undocumented configs necessary in the server.properties file[cite: 15].
* Cannot upgrade clusters if share groups are enabled[cite: 15].
* Introduces share groups as an experimental feature, which is the beginning of queue functionality in Kafka (from video).
* Share groups allow multiple consumers to consume from the same partition (from video).
* This feature is **not for production use** and requires an undocumented configuration (from video).
* **KIP-1106: Add Duration-Based Offset Reset Option for Consumer Clients**
* KIP-405 added tiered storage, which made auto.offset.reset=earliest less fun[cite: 16].
* New duration-based offset reset: by\_duration[cite: 16].
* No change to the existing semantics and default value of auto.offset.reset[cite: 17].
* KafkaConsumer: Updated auto.offset.reset config option to support new config value of the format by\_duration:<duration>[cite: 18, 19].
* This automatically resets the offsets back by configured duration[cite: 19].
* KafkaShareConsumer: Add support for by\_duration:<duration> config value for share.auto.offset.reset[cite: 19].
* Kafka Streams: Deprecate existing Topology.AutoOffsetReset enum, add a new class org.apache.kafka.streams.AutoOffsetReset to capture the reset strategies[cite: 20].
* Adds the option to reset consumer offsets by duration (from video).
* This is particularly useful for tiered storage, preventing the need to read through large amounts of old data (from video).
* **KIP-1102: Clients re-bootstrapping based on timeout or error code**
* KIP-899 Limitations: Re-bootstrapping only occurred when no brokers were available[cite: 21].
* Clients could get stuck with stale metadata if connected to some brokers[cite: 22].
* **KIP-1102 Improvements:**
* Proactive re-bootstrapping: Introduces a metadata refresh timeout (default 5 minutes)[cite: 23].
* Clients re-bootstrap if no metadata update is received within this time[cite: 24].
* Server-triggered re-bootstrapping: Brokers/proxies can now return a REBOOTSTRAP\_REQUIRED error code, explicitly telling clients to re-bootstrap[cite: 25].
* Default in 4.0: metadata.recovery.strategy defaults to rebootstrap for new clients, improving resilience out-of-the-box[cite: 26].
* Improves the bootstrapping process (where clients discover broker metadata) (from video).
* Proactive and server-triggered re-bootstrapping prevents issues with stale metadata, especially after long client downtimes (from video).
* **KIP-966: Eligible Leader Replicas (part 1)**
* Part 1 (Included in Kafka 4.0): Enhanced Data Safety (No Unclean Leader Election):
* Introduces Eligible Leader Replicas (ELR): A subset of ISR replicas guaranteed to have complete data up to the high-watermark[cite: 27, 28].
* ELRs are safe for leader election, preventing data loss[cite: 28].
* ELR tracking: Managed by the KRaft controller, stored in PartitionRecord, no performance impact[cite: 29].
* DescribeTopicPartitions API introduced (default in 3.9, so not new in 4.0)[cite: 29, 30].
* Admin clients now use this API instead of the Metadata API[cite: 30].
* Requires minimum metadata version 4.0 IV1[cite: 31].
* Note: ELR is not enabled by default in 4.0. May become default in 4.1[cite: 31].
* Protects against data loss in specific scenarios involving single in-sync replicas (from video).
* Introduces "eligible leader replicas" which are guaranteed to have data up to the watermark (from video).
* **KIP-1076 & KIP-1091: Unlocking Kafka Streams Observability**
* **Before KIP-1076 & KIP-1091**
* Manual and cumbersome health assessment[cite: 32].
* Developers had to instrument and export metrics[cite: 32].
* Operators relied on dashboards for application health[cite: 32].
* **KIP-1076 Highlights**
* Extends KIP-714 (AK 3.7 in Feb, 2024) for Kafka Streams applications[cite: 32].
* Cluster operators can pull application metrics from the application[cite: 32].
* **KIP-1091 Highlights**
* Adds state metrics for each StreamThread and the client instance itself[cite: 32].
* Provides detailed visibility into application state[cite: 32].
* KIP-1076 allows the cluster to pull observability data from Kafka Streams clients (from video).
* KIP-1091 extends this with specific metadata about stream threads (from video).
* **KIP-1112: Custom Processor Wrapping**
* **Before KIP-1112**
* Cross-cutting logic required manual integration into each processor[cite: 33].
* DSL users lacked a straightforward method to apply these altogether[cite: 33].
* Resulted in duplicated code and increased maintenance efforts[cite: 33].
* **KIP-1112 Highlights**
* Introduces ProcessorWrapper interface[cite: 33].
* Inject custom logic around both Processor API and DSL processors seamlessly[cite: 33].
* Enables custom code to be run automatically within Kafka Streams DSL methods (from video).
* This facilitates cross-cutting concerns like logging within DSL calls (from video).
* **KIP-1104: Extract ForeignKey from Key in KTable Join**
* **Before KIP-1104**
* Foreign key extractor function could only access the record value[cite: 34].
* Required duplicating the record key into the value if one wants to use the key for FK joins[cite: 34].
* More work, more storage overhead[cite: 34].
* **KIP-1104 Highlights**
* Foreign keys can now be extracted from both the record key and record value[cite: 34].
* Simplifies the join process with a more intuitive API[cite: 34].
* Allows extracting a foreign key from the key of a message, simplifying join operations (from video).
* **KIP-1065: Better Reliability with Automatic Retry Mechanisms**
* **Before KIP-1065**
* Persistent errors (e.g., broker unavailable, missing output topic)[cite: 35].
* Default retry mechanism until timeout without the ability to break the retry loop, even after timeout[cite: 35].
* Expensive task recovery[cite: 35].
* **KIP-1065 Improvements**
* Enables users to break retry loops[cite: 35].
* "RETRY" option in ProductionExceptionHandler[cite: 35].
* Custom handlers for more control: Retry, Fail gracefully, Drop record and continue[cite: 35].
* Improves logging and handling of errors when output topics are unavailable (from video).
* **Summary of Key Changes:**
* ZooKeeper is replaced by KRaft for metadata management (from video).
* Queue functionality is in progress, with share groups as the first step (from video).
* Users are advised to read the release notes for complete details (from video).
@rsrini7
Copy link
Author

rsrini7 commented Apr 1, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment