DynamoDB for System Design

### What is DynamoDB?

DynamoDB is a fully managed, highly scalable, key-value service provided by AWS. It automatically handles operational tasks including hardware provisioning, configuration, patching, and scaling. Additionally, a great strength is that it is highly scalable enabling it to handle massive amounts of traffic and data. This is a NoSQL key-value data base, and is not open source.

### DynamoDB Data Model

Data is organized into tables in DynamoDB (similar to SQL) such that, each table has multiple items. However, there is not a schema enforcement for this NoSQL database, so each entry can have different types of items. Each item can contain up to 400 KB of data, and each item must have a primary key. There are two choices for said primary key: 
- Partition Key
- Composite Key (Partition Key + Sort Key)

A **partition key**  is a single attribute that uniquely identifies (through consistent hashing) the physical location of the item within the database. A partition is not just a region of memory, it's a chunk of SSD (Solid State Drive) storage, backed up by multiple copies across availability zones. Each partition can hold up to 10 GB of data and can handle up to 1k writes per second and 3k reads per second. Partitions split when they have more than 10 GB of data inside. The partition key is hashed and used in conjunction with consistent hashing to evenly distribute the partitions.

A **composite key** is composed of a **partition key** and a **sort key**. A composite key is useful if you want to store more than one item within a partition. The sort key is usually used within something like a B-Tree enabling efficient range queries. Note that a partition key is always required, but a sort key (making a composite key) is not.

The composite key, combining partitions with locally sorted data, is a fast & scalable pattern since you can easily get the partition from the hash and then sort the remaining entries. Note that DynamoDB can hold several different data types including: strings, numbers, booleans, string sets, and number sets, along with nested data types.

### Secondary Indexes

DynamoDB additionally supports secondary indexes, enabling querying data that is not with respect to the partition key. This gives customers flexibility and ease of use. The two indexes we will talk about are **Global Secondary Index (GSI)** and **Local Secondary Index (LSI)**.

The **Global Secondary Index (GSI)** is an index with a partition key and an optional sort key, just like our beforementioned key situation. However, this Global Secondary Index differs from the table's partition key and enables users to query based on attributes other than the table's partition key. Because Global Secondary Indexes use different partition key data, it is stored on entirely different physical partitions and is replicated separately. Each Global Secondary Index is a separate table with its own partition scheme, and is updated asynchronously when the main table is (*eventual consistency*). Additionally, it uses the same consistent hashing mechanism as the main table.

The **Local Secondary Index (LSI)** is an index with the same partition key, but a different sort key (stored in same partition). This enables users to sort by multiple fields within a partition and do range queries. Note that the Local Secondary Index has a separate B-tree structure as the primary sort key. This index is synchronously updated with main table updates, ensuring strong consistency.

### Accessing Data

There are two ways to access data, a **Scan Operation** and a **Query Operation**. A scan operation reads every item in the dataset and returns a paginated response, which is inefficient for large datasets. A query operation retrieves based on primary or secondary key attributes (partition key & sort key) and is more efficient than scans.

### CAP Theorem

DynamoDB supports two consistency models **eventual consistency** and **strong consistency**, which is configurable in the dashboard. DynamoDB is a distributed NoSQL database, so it replicates data across multiple storage nodes (usually 3 per partition) for high availability and durability. This replication is what gives us these two choices.

Eventual consistency is the default model that's *Available* and *Partition Tolerant* (wrt CAP). Write operations are first written to the primary replica and asynchronously replicated to secondary replicas. Reads can be served by any replica. Background processes synchronize data across replicas.

Strong consistency in DynamoDB is where are reads reflect the most recent write. This comes at a cost of higher latency and lower availability though. This strategy is *Consistent* and *Partition Tolerant* (wrt CAP). Note that this is supported for only base tables and Local Secondary Indexes. The way this works is that each DynamoDB partition has a leader node. All writes go to said leader. Strongly consistent reads are always served from the leader node too, ensuring that the latest state is seen. The replicas are then eventually consistent for durability. Note that you cannot do strongly consistent reads on Global Secondary Indexes because they are in a different partition.

### Scalability and Fault Tolerance

DynamoDB is an AWS managed database which gives us many advantages. It automatically shards its data and comes with load balancers. Furthermore, it has Global Tables allowing real time replication across regions. Due to DynamoDB automatically replicating data, it is highly available. Additionally, we can configure the redundancy level of data by specifying the number of read and write replicas. Due to DynamoDB being a distributed set of nodes, it uses a quorum based replication system (voting) for data consistency and durability. To note, DynamoDB is encrypted at rest and a Virtual Private Cloud (VPC) end points to access DynamoDB.

Comments (2)

dokuDoku February 19, 2026 at 06:16 AM

dokuDoku February 19, 2026 at 01:24 AM