Kafka Cuts the Disk, Postgres Gets Async

Cloud-Native Upgrades for Your Data Stack

Oct 13, 2025

Welcome

We’re thrilled to launch our first monthly newsletter! Your open source data stack is moving faster than ever, so we’re here to cut through the noise. This edition covers the must-know updates, from Postgres AIO to Diskless Kafka. Enjoy the read!

PostgreSQL 18 Major Release

The PostgreSQL Global Development Group released PostgreSQL 18 on September 25, 2025. The most significant feature is the introduction of a dedicated Asynchronous I/O (AIO) subsystem. This is a critical performance boost that allows the database to issue multiple I/O requests concurrently rather than waiting for each one to finish (synchronous I/O). Benchmarks show performance gains of up to 3x in read-heavy workloads like sequential and bitmap heap scans.

How AIO Works on a Single Disk and its Differentiation from Past I/O

The core difference between pre-PostgreSQL 18 versions and the new release lies in the I/O model and control mechanism. Previous versions used a Synchronous/Blocking I/O model, where a backend process would block and wait for each individual disk read request to complete before proceeding. The database largely relied on the OS to optimize disk access, using hints like posix_fadvise() and depending on the kernel’s generic read-ahead functionality. While the underlying kernel scheduler was non-blocking, it could only optimize the single request submitted at that moment.

In contrast, PostgreSQL 18 introduces a Database-Driven Asynchronous I/O (AIO) subsystem. This model is Asynchronous/Non-Blocking, allowing a process to submit multiple I/O requests concurrently and continue its work, effectively overlapping I/O wait time with useful CPU computation. Crucially, the database now actively queues a deeper stream of requests (via I/O Worker processes or the high-performance Linux io_uring interface) and submits this batch to the kernel scheduler. The scheduler can then use this larger pool of tasks to perform superior reordering and batching, maximizing the single hard drive’s throughput, and ultimately providing the up to 3× performance improvements observed in read-heavy workloads. By taking direct control over when and how many I/O requests are issued, PostgreSQL can now overlap I/O wait time with CPU work, dramatically reducing latency and boosting overall system throughput.

Valkey 9.0 Enterprise Features

The Valkey project, the high-performance, open source key/value data store, launched Valkey 9.0 at the end of September, marking a major milestone for the community-driven fork of Redis. This release significantly expands the platform’s capabilities by introducing several features aimed at delivering enterprise-grade reliability, security, and operational safety.

Key additions focus on increasing system stability and compliance. This includes a new Safe Shutdown Mode and a Unified Auto-Failover Configuration to ensure predictable and consistent high availability across all deployment environments, minimizing the risk of accidental downtime. For regulated industries, Valkey 9.0 builds security directly into the core platform by introducing TLS certificate-based automatic client authentication (mTLS). This native support for mutual TLS helps organizations lock down access and streamline security audits without relying on external proxies.

Per-Field Expiration for Hash Objects

One of the most significant new features for developers and cache architects is the introduction of per-field expiration for hash objects. Previously, expiration in Valkey (and Redis) worked only at the level of the entire key; setting a time-to-live (TTL) meant the entire hash, list, or string object would be evicted when the timer expired.

With Valkey 9.0, users can now set an individual TTL for a specific field within a Hash data structure. This capability dramatically improves cache management flexibility, especially for applications dealing with complex or heterogeneous data structures. For example, a user profile stored as a single hash might contain: Static Data (never expires), a short-lived Authentication Token (expires in 15 minutes), and a Semi-Volatile List (expires in 24 hours). Instead of forcing developers to split this data across multiple independent keys and manage complex application logic to rejoin them, Valkey 9.0 allows it to remain in a single, well-organized Hash key. This reduces key space, simplifies application code, and optimizes memory usage by eliminating the need to expire and re-fetch an entire object just because one small piece of data within it has become stale.

Apache Kafka 4.1.0 Release and the Rise of Diskless Kafka

Apache Kafka is the de facto standard for open source data streaming, serving as the central nervous system for modern event-driven architectures. It is primarily used for building real-time data pipelines, streaming analytics, log aggregation, and powering transactional systems that require high-throughput, low-latency data movement.

The Apache Kafka community announced the availability of Kafka 4.1.0 on September 4, 2025. This incremental release delivered several key features, including a new Preview version of Queues for Kafka (KIP-932). This feature enables queue-like semantics, allowing multiple consumers to cooperatively process records from the same partitions with individual message acknowledgements, which greatly simplifies use cases like distributed work queues. Other improvements include early access to a new Streams Rebalance Protocol and native support for the OAuth JWT-Bearer grant type for stronger client security.

The Architectural Shift to Diskless Kafka

Beyond these core features, Kafka’s future is defined by the move toward Diskless Topics, a cloud-native architectural shift (KIP-1150) that reached critical momentum this year.

In traditional Kafka, data is stored and replicated across local broker disks, incurring high costs from expensive block storage and substantial inter-Availability Zone (AZ) network fees for replication. Diskless Kafka fundamentally decouples compute (brokers) from storage by allowing new topics to write data directly to cloud object storage (like Amazon S3).

This shift offers profound benefits: Massive Cost Reduction (eliminates costly cross-AZ replication traffic and replaces local disks with economical object storage, potentially reducing total cost of ownership by up to 80%); and True Elasticity (brokers become largely stateless, enabling instantaneous scaling without lengthy data rebalancing). While Diskless Topics introduce a slight latency trade-off (200-400ms for cost-optimized topics vs. sub-100ms for traditional topics), this is acceptable for many workloads like logging and analytics, transforming Kafka into a highly elastic, cost-efficient, cloud-native event backbone.

Gui's Substack

Discussion about this post