Axiom — Engineering Blog

Notes on distributed systems, infrastructure, and the craft of software engineering.

May 12, 2026

Scaling Postgres beyond a single primary

After hitting 50k QPS our database started showing strain. Here's how we approached read replicas, connection pooling, and eventually moved to a partitioned setup.

postgresinfra

May 07, 2026

The hidden cost of event-driven architecture

Event-driven systems promise loose coupling, but the operational complexity is real. Debugging cascading failures across 30 services taught us when synchronous calls are actually fine.

architecture

April 28, 2026

Why we capped our observability budget

Datadog bill creeping toward 6 figures? You're not alone. We cut spend by 60% without losing visibility.

observabilityfinops

April 21, 2026

Anatomy of a 4-hour outage

A misconfigured Kafka consumer triggered cascading failures across our entire payment pipeline. Full timeline, mistakes made.

incident