CORE CONCEPTS DEEP DIVE

AWS 1x1 — DynamoDB

All the things you need to get you hooked with AWS’ holy grail of database solutions

Tobias Schmidt

Published in

Towards AWS

7 min readOct 13, 2021

DynamoDB sticks out as one of AWS’ most famous and widely-used services that can be found everywhere. Especially if you’re working — or want to work — with a serverless technology stack, DynamoDB perfectly integrates with Lambda and other managed services like SQS and SNS.

In this article, I’ll give you insights into all the aspects of DynamoDB and why it’s different from other NoSQL solutions.

An overview:

Introduction — why care about DynamoDB
Provisioned vs On-Demand Capacity — both modes in detail
Basic Concepts — primary keys and range keys
Attributes & Types — your documents containments
Retrieving Items — query vs scan
Race Conditions — how to handle multi-tenancy issues
Expressions — using additional constraints on operations
Indexes — advanced access patterns
Streams — attaching services to DynamoDB operations
Security — securing your data
Backups — layers for recovery
Global Tables — multi-region setups
Observability — tracking your tables

Jet’s jump into it.

Introduction

Why should you care about DynamoDB? It’s managed, highly available & scales on-demand with low latencies.

For getting you hooked, at Prime Days 2021 DynamoDB served 89.2 million requests per second at its peak.

I’ve learned other services by exploration and trial & error

That’s totally legit and often works out. But DynamoDB is different. You’ll save yourself a lot of pain in the future if you dive deep in the beginning.

There’s a lot of value in taking your time to deeply explore DynamoDB before trying to build a production service with it.

Provisioned vs. On-Demand Capacity

You can choose between those two types, but can also change them at any time.

provisioned — specifying the capacity units for your table & you’ll be billed for them
on-demand — paying per request

Which one should you pick?

Go with 𝗢𝗻-𝗗𝗲𝗺𝗮𝗻𝗱 if having unpredictable traffic, as it scales on-demand and you’re only paying for what you actually use.

With steady load or known patterns, pick 𝗣𝗿𝗼𝘃𝗶𝘀𝗶𝗼𝗻𝗲𝗱 as it’s almost 𝟳 𝘁𝗶𝗺𝗲𝘀 less expensive!

Your traffic patterns can vary with 𝗢𝗻-𝗗𝗲𝗺𝗮𝗻𝗱 as you can create an auto-scaling configuration based on CloudWatch metrics to increase/decrease your capacities!

It’s not an easy task to do this well though.

Free Tier Reminder

If you’re still on the free tier (your account was created less than 1 year ago) and you’re only having low traffic / few tables, always stick to 𝗣𝗿𝗼𝘃𝗶𝘀𝗶𝗼𝗻𝗲𝗱 as it includes 25 read & write capacity units for free.

Basic Concepts

In comparison to SQL, a document in DynamoDB doesn’t have a fixed schema. What is defined by the table: the primary key, which uniquely identifies each document

A document can also have other attributes of different types.

Primary Keys

It’s is your 𝘂𝗻𝗶𝗾𝘂𝗲 identifier & must be provided when inserting a new item

There are two different types of primary keys

simple — a single field; also your partition key
composite — build-up via your partition and range key

Partition Keys

Internally, DynamoDB consists of different partitions where your items will be stored at.

Your partition key will run through a hash function which result will determine the partition. A good partition key should be equally distributed

Why is that important? Your provisioned read & write capacity units will be distributed among partitions.

If your items are not well-distributed, it’s way easier to get your requests throttled as you’ll end up with 𝗵𝗼𝘁 𝗽𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻𝘀 (receiving high load).

Range Keys

Besides only having partition keys as your primary key, you can have a composite key.

It also spans over the range key, so your partition plus range key has to be unique. There are a lot of benefits as the range key can be used with Expressions.

Attributes & Types

Besides your primary key, your document can contain other fields of different types. Among those are Strings (𝗦), Numbers (𝗡), Binaries (𝗠), Booleans (𝗕𝗢𝗢𝗟), Lists (𝗟), and Maps (𝗠).

Example: “Provider”: { “S”: “AWS” }

Retrieving Items

That’s where it gets interesting and you see differences to SQL or other NoSQL solutions. You can only query on indexes: your partition key & range key, if there’s any.

Everything else needs a scan.

How do queries and scans differ?

A scan is just running through your table looking for items that are matching your expression.

You’ll be billed for the items which are scanned, not the items that are retrieved.

With query, you’re only paying for the retrieved items. It’s only looking for the items at a specific partition. So generally speaking: query is way faster and cheaper!

Race Conditions

Often, there are possible race conditions due to multi-tenancy.

Example:

Process 1 reads Document A
Process 2 reads Document A
Process 1 writes Document A
Process 2 writes Document A

We’ll lose our first write.

DynamoDB got you covered by using Versions. With DynamoDB’s Data Mapper, you can stick to using a field as a version indicator.

Each update will increment its value. Internally, expressions will be used to check that the version matches our expected ones!

We’re ensuring that there are no intermediate writes, which would increase our version number

Intermediate write will throw a ConditionalCheckFailedException. We can catch those and then handle the conflicts by for example merging both updates.

Expressions

With expressions, you can check for certain conditions that must be met to actually execute your statement.

Types of expressions:

Condition Expression
Projection Expression
Update Expression
Key Condition Expression
Filter Expression

Example: Condition Expressions check for conditions that have to be met before applying an update to a document.

Build these with the known comparators equals (=), greater than (>), or greater or equal than (>=).

Indexes

As we’re learned, you can only query on your partition and range keys. But that can’t be everything? You’re right: you can create indexes, which are specifying alternative key structures.

Those can also be used to query your items. There are two different types of indexes:

• Local Secondary Index (LSI) — needs to have the same hash/partition key, but an alternative range key

• Global Secondary Index (GSI)— partition & range key can be different

Both allow us a more flexible query structure.

More things to know about Secondary Indexes:

• no uniqueness requirement for the primary keys of your secondary indexes

• the attributes for your secondary index are optional

• the number of secondary indexes are limited per table (LSI: 5, GSI: 20)

Also, you can specify which attributes are projected to your secondary index.

• KEYS_ONLY — only the (underlying) keys

• ALL — the full item

• INCLUDE — only specific fields

Put thought into this.

Streams

DynamoDB Streams are another great features that allow you to invoke other services if items are created, updated, or deleted.

Example: forward data to ElasticSearch via Lambda!
This also allows you to manipulate or filter!

Security

DynamoDB tables are by default encrypted via KMS. You can also choose to use a customer-managed key (CMK) which you are in control of.

As with other AWS services, access is fully covered via 𝗜𝗔𝗠.

Backups

DynamoDB’s managed service and brings its own redundancy, but this does not protect you from making mistakes on your own.

That’s why you have to keep backups.

The good part: AWS makes this easy for you.

On-Demand Backups

The easiest and cheapest option is to regularly trigger on-demand backups.

Just create a Lambda function that will trigger backups on your table via AWS-SDK.

Add an EventBridge rule which invokes your function regularly.

Continuous backups via Point-In-Time-Recovery

Enabling PITR for your table will allow you to restore your table to any state within the last 35 days. It comes with additional costs.

Exporting Backups to S3

The first two options do not protect you from table drops. That’s why you should export your data to S3.

It’s also a feature directly offered by DynamoDB for PITR enabled tables. Automate this via Lambda & EventBridge rules!

Global Tables

It’s likely that you want to have your infrastructure distributed around the globe for redundancy and faster latencies.

With DynamoDB, you can have synchronized tables in different regions.

Observability

Regardless if you’re using on-demand or provisioned capacity, you should always know what’s going on: how much capacity is used, are there throttling events or spiking latencies, or is everything operating as expected?

CloudWatch offers a great set of metrics to get a glance at your tables. You’ll see:

used read capacity units
used write capacity units
throttles

You can configure alerts on throttles or if certain thresholds for RCUs/WCUs are crossed.

Third-party tools like Dashbird.io help you monitoring your DynamoDB tables as well as giving you well-architected recommendations.

This helps you to find and fix errors & anomalies.

Guides

Content is mostly inspired by The DynamoDB Guide by @alexbdebrie!
Have a look & deep dive into DynamoDB — you won’t regret it.

Alex does a great job at explaining concepts in detail.

It doesn’t end here! There’s a lot more to learn

My recommended resource for working with DynamoDB in a professional context: The DynamoDB Book — also by @alexbdebrie

Likely, it will save you a lot more money than it costs 🙌

Bonus: DynamoDB’s Whitepaper

Even if you’re not into reading papers, that one is worth reading anyway.
If you got some spare time, take a look.

Final thoughts

I’ve designed and built various microservice architectures with DynamoDB as its core database and never ran into any issues. It’s a highly available, fully-managed database that can be trusted in every way.

There only important thing is that you’re aware of its peculiarities and your access pattern so that you can structure your data correctly.

Thank you for reading & I hope you’ve learned something new or I’ve got you hooked about DynamoDB.