Apache Kafka has become the go-to solution for building scalable, real-time data pipelines. But its traditional reliance on local disk I/O and intra-cluster replication often comes at a steep operational and financial cost — especially in cloud-native environments.
Enter KIP-1150: Diskless Topics, a proposal led and open-sourced by Aiven, that radically simplifies Kafka’s architecture by enabling direct writes to object storage like Amazon S3 — entirely bypassing the need for broker-local disks.
What Are Diskless Topics?
Diskless Topics introduce a new Kafka topic type that:
- Writes data directly to remote object storage (e.g., S3, GCS, Azure Blob), skipping local disk altogether
- Removes broker statefulness, allowing any broker to serve any partition
- Eliminates intra-cluster replication, relying instead on the durability of the backing storage
- Reduces storage and IOPS costs, especially in cloud environments
This shift enables Kafka to act more like a cloud-native stateless event router, aligning it with modern data lake architectures and the economics of the cloud.
Why This Matters
- Dramatic Cost Reduction
By cutting out local disk writes and cross-zone replication, diskless topics can reduce Kafka storage and replication costs by up to 97%, according to Aiven. - Stateless Brokers = Simpler Scaling
Without the burden of local persistence, Kafka brokers become easier to scale, upgrade, or failover. This is a massive win for teams managing large or geo-distributed Kafka clusters. - Unified Object Storage Strategy
Diskless topics integrate cleanly with data lake ecosystems (e.g., Iceberg, Delta Lake), enabling real-time ingestion directly into your lakehouse — using a single, low-cost, durable storage tier. - Better Multi-Cloud and Multi-Region Architecture
Removing intra-cluster replication makes cross-region Kafka significantly cheaper and easier, enabling robust, multi-region deployments without excessive data transfer bills. - Sustainability and Efficiency
Reducing disk I/O and storage usage also means lowering the energy footprint of Kafka infrastructure — a win for cost and carbon.
Ideal Use Cases for Diskless Topics
Real-Time Ingestion into Apache Iceberg
With Kafka writing directly to object storage, systems like Apache Iceberg can ingest and query fresh event data with zero-copy efficiency, eliminating the traditional staging pipelines and file copy steps.
Ephemeral or High-Velocity Streams
IoT telemetry, metrics, logs, and other short-lived data can be routed through Kafka without paying the cost of persistence.
Stream-to-Lake Pipelines
Ingest events directly into lakehouse systems without needing intermediary disk-backed Kafka setups — streaming at cloud scale, priced like batch.
Multi-Region & Edge Streaming
No replication + stateless brokers = Kafka that works better in distributed and hybrid environments, including edge deployments.
Development, Testing, and CI Environments
Diskless topics allow developers to run Kafka for ephemeral use cases without worrying about disk cleanup or durability.
Final Thoughts
Aiven’s Diskless Topics initiative is a major leap toward making Kafka cloud-native — optimizing for performance, cost, and simplicity.
Whether you’re streaming IoT data, building a real-time lakehouse with Apache Iceberg, or running Kafka in multi-cloud or hybrid architectures, this change offers game-changing flexibility.
As the Kafka ecosystem continues to evolve, diskless topics represent a foundational shift — transforming Kafka from a log-oriented storage system to a lightweight, durable event fabric for modern data platforms.
✉️ Working with Kafka or planning to? Let’s connect.
If you’re building real-time data platforms, modernizing your event architecture, or just exploring how Kafka fits into your stack — we’d love to hear about it. Open to collaborations, architecture discussions, or simply sharing ideas. Let’s build something great together.
Let's work together
We are open for new projects.