Writing
Notes on systems & speed
Deep dives into Rust experiments, Apache DataFusion internals, and the mechanics behind real speedups — with benchmarks and flamegraphs, not vibes.
- { } DataFusion
Inside a DataFusion Parquet Scan: Skipping Page Index I/O When Statistics Already Decide
How a Parquet scan reads a file end to end — footer, row groups, page index, bloom filters — and why PR #22857 stops loading page index metadata when row-group statistics already prove the filter.
Read → - ⌗ Trino
How a Deadlock Froze Blinkit's Supply Chain
A silent deadlock in our query engine stalled inventory replenishment with no error, no crash — just infinite waiting. How we traced it to a shared thread pool in Trino's Hudi connector and fixed it upstream.
Read on Lambda by Blinkit → - { } DataFusion
Zero-Copy Strings in Apache DataFusion: How StringViewArray Boosted Performance by 8%
How StringViewArray cut the copy tax on string operations and lifted ClickBench performance by 8%.
Read → - ~ Async
Async Runtimes vs Threads in Rust: Which Is Better, and When?
Tokio wins on tiny and waiting-heavy workloads; threads catch up on pure CPU. A measured guide to when each model fits.
Read → - ⇄ Concurrency
Atomics vs Mutex in Rust: Why Mutex Won Under Heavy Contention
Why a mutex beat atomics under heavy contention — with flamegraphs and a counterintuitive takeaway.
Read → - ⧉ CPU
The Hidden Performance Killer: How 56 Bytes of Padding Made My Rust Code 4.6x Faster
How 56 bytes of padding turned a 749ms benchmark into 163ms — the hidden cost of cache-line false sharing.
Read →