Scaling Postgres beyond a single primary
After hitting 50k QPS our database started showing strain. Here's how we approached read replicas, connection pooling, and eventually moved to a partitioned setup.
postgresinfra
Notes on distributed systems, infrastructure, and the craft of software engineering.
After hitting 50k QPS our database started showing strain. Here's how we approached read replicas, connection pooling, and eventually moved to a partitioned setup.
Event-driven systems promise loose coupling, but the operational complexity is real. Debugging cascading failures across 30 services taught us when synchronous calls are actually fine.
Datadog bill creeping toward 6 figures? You're not alone. We cut spend by 60% without losing visibility.
A misconfigured Kafka consumer triggered cascading failures across our entire payment pipeline. Full timeline, mistakes made.