Amazon needed a database that would keep working even when servers failed, networks split, or whole data centres went dark — because being down for even a minute meant lost sales at massive scale. Traditional databases solved uncertainty by refusing to answer: if they were not sure of the state, they would return an error or block. Dynamo did the opposite: always accept writes, always answer reads, and sort out any contradictions later. The technique that made this work — tracking which version of data is newer using vector clocks — became a foundational idea in distributed systems. Dynamo itself never became a public product, but its ideas live on in Cassandra, Amazon's public DynamoDB, Riak, and in every engineering team that has argued about consistency models.
Dynamo: Amazon's Highly Available Key-Value Store
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, Werner Vogels
Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world. Even the slightest outage has significant financial consequences and impacts customer trust. This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon's core services use to…
“The paper that gave the industry vocabulary for hard trade-offs — and proved that 'always wrong sometimes' beats 'sometimes unavailable'.”
Designing any distributed data store where availability must survive node failures and network partitions — shopping carts, session state, user preferences, any write-heavy workload that can tolerate eventual consistency.
Your use case requires strong consistency: financial transactions, inventory deduction, leader election, anything where stale reads cause correctness bugs. Dynamo's model is deliberately inconsistent under certain conditions.
Traditional RDBMS systems with strong consistency. Dynamo explicitly chose availability over consistency in the CAP trade-off, enabling a class of systems — Cassandra, Riak, Voldemort — that shaped distributed systems for a decade.
What It Claims
As told for the curious
Key Ideas
6 contributions · The core concepts in plain terms
Read Next
Papers and articles that extend or critique these ideas
Related Work
Papers and codebases in the same intellectual neighbourhood
Export & Share
Take the field notes with you
Terminology
Hover the dotted terms above for definitions in context
consistent hashing
conceptA partitioning scheme mapping keys and nodes to a ring, minimising redistribution when cluster membership changes.
eventual consistency
conceptA consistency model guaranteeing that if no new updates are made, all replicas will converge to the same value — but makes no guarantee about timing.
hinted handoff
conceptA mechanism where a surrogate node stores writes for a failed node with a hint to forward them when the target recovers.
merkle tree
conceptA hash tree used to efficiently detect divergence between replicas — comparing root hashes narrows to diverging keys in O(log n) steps.
quorum
conceptThe minimum number of nodes that must acknowledge a read or write for the operation to succeed.
sloppy quorum
conceptA quorum satisfied by any available nodes rather than designated replicas, accepting temporary inconsistency for availability during failures.
vector clock
conceptA list of (node, counter) pairs tracking write causality — allows detection of concurrent conflicting versions.