AWS Database Best Practices: RDS, DynamoDB, and ElastiCache Guide

Technical TL;DR

Database = Application Heart. Wrong choice = re-architecture.

Key takeaways:

**RDS for relational** (MySQL, PostgreSQL, SQL Server, Oracle, MariaDB, Aurora)

**DynamoDB for NoSQL** (single-digit ms latency, unlimited scale)

**ElastiCache for caching** (Redis, Memcached)

**Multi-AZ for HA** (automatic failover, synchronous replication)

**Read Replicas for scaling reads** (asynchronous, eventual consistency)

**Aurora for cloud-native** (5x MySQL performance, auto-storage)

---

1. Database Selection Framework

1.1 Relational vs. NoSQL Decision

Start with relational (RDS). Only use NoSQL if specific requirements demand it.

| Requirement | RDS (Relational) | DynamoDB (NoSQL) |

|-------------|------------------|------------------|

| **Structured data, fixed schema** | ✅ Best Choice | ❌ Not designed for |

| **Complex joins, transactions** | ✅ ACID compliant | ❌ No joins |

| **Ad-hoc queries, BI tools** | ✅ SQL standard | ❌ No query language |

| **Single-digit ms latency at any scale** | ❌ Performance degrades | ✅ Consistent performance |

| **Simple access patterns (key-value)** | ⚠️ Overkill | ✅ Optimized for |

| **Auto-scaling writes** | ❌ Manual scaling | ✅ Unlimited scale |

| **Eventual consistency acceptable** | ⚠️ Can be configured | ✅ Default behavior |

1.2 RDS Engine Selection

Choose the right RDS engine based on requirements and team skills.

|--------|----------|---------------------|------|

Recommendation:

```yaml

New Projects → Aurora (MySQL or PostgreSQL)

Existing MySQL/Postgres → Migrate to Aurora when convenient

SQL Server/Oracle → Migrate to Aurora only if business justification

```

---

2. Amazon RDS Best Practices

2.1 Multi-AZ Deployments: HA Requirement

All production RDS instances must be Multi-AZ.

Multi-AZ Architecture:

```yaml

Primary DB: us-east-1a

↓ Synchronous replication

Standby DB: us-east-1b (different AZ)

Automatic Failover:

- DNS updates to point to standby

- Typical failover: 30-60 seconds

- No data loss (synchronous replication)

- Zero manual intervention required

```

Configuration:

```yaml

Production: Multi-AZ = YES (non-negotiable)

Development: Single-AZ acceptable

Staging: Multi-AZ recommended (production-like)

```

2.2 Read Replicas: Scale Reads, Not Writes

Read replicas offload read traffic from primary database.

Use Cases:

```yaml

☐ Analytics/reporting queries (no impact on production)

☐ Geographic distribution (low-latency reads in other regions)

☐ Burst traffic (scale reads temporarily)

☐ Backup verification (queries on replica, not primary)

```

Important Limitations:

```yaml

☐ Asynchronous replication (potential data lag: milliseconds to seconds)

☐ Write traffic still goes to primary

☐ No automatic failover to replica (use Multi-AZ for HA)

☐ Maximum 5 read replicas per database

```

2.3 RDS Instance Sizing

Right-size RDS instances. Databases are often 2-3x oversized.

Sizing Methodology:

```yaml

1. Monitor CloudWatch metrics (CPU, memory, connections)

2. Target: 70-80% CPU utilization at peak

3. Use AWS Compute Optimizer for ML-driven recommendations

4. Test smaller instance classes in non-production

5. Consider burstable instances (db.t3) for dev/test

```

Instance Classes:

```yaml

db.t3 (burstable): Dev/test, low-traffic prod

db.m5 (general purpose): Most production workloads

db.r5 (memory-optimized): In-memory caches, high connection counts

db.x1g (memory-optimized): Large datasets (2TB+ in memory)

```

2.4 RDS Storage: Provisioned IOPS for Databases

Databases require provisioned IOPS (io1) for consistent performance.

Storage Types:

```yaml

gp2/gp3: General purpose SSD

- gp3: Recommended baseline (3,000 IOPS included)

- Cost-effective for most workloads

io1: Provisioned IOPS SSD

- Required for production databases

- Consistent IOPS regardless of volume size

- Cost: $0.125/GB/month + IOPS fee

```

IOPS Provisioning:

```yaml

Database IOPS = (Transactions/sec × IOPS per transaction)

Example: 2,000 TPS × 3 IOPS/transaction = 6,000 IOPS

Add 30% headroom: 7,800 IOPS provisioned

```

2.5 RDS Encryption: Required

All production RDS instances must be encrypted.

Configuration:

```yaml

☐ Enable encryption at creation (cannot add later)

☐ Use customer-managed KMS keys (CMKs)

☐ Separate keys per environment (dev/staging/prod)

☐ Snapshots inherit encryption from instance

☐ Read replicas require encryption if primary is encrypted

```

2.6 RDS Parameter Groups: Tuning

Parameter groups control database configuration.

Required Customizations:

```yaml

MySQL/PostgreSQL:

- max_connections: Based on instance memory

- shared_buffers: 25% of instance memory (Postgres)

- innodb_buffer_pool_size: 50-70% of instance memory (MySQL)

- log_slow_queries: Enable for performance analysis

- backup_retention_period: 7-35 days (compliance)

SQL Server:

- max degree of parallelism: Based on CPU cores

- cost threshold for parallelism: 30-50

- max server memory: Leave 20% for OS

```

2.7 RDS Backups: Point-in-Time Recovery

Backups are enabled by default. Configure retention for compliance.

Backup Strategy:

```yaml

Backup Window: Choose low-traffic period (backups impact performance)

Retention Period:

- Production: 35 days (PCI/HIPAA require 30+)

- Staging: 7 days

- Development: 1-7 days

Snapshot Window: Align with backup window

Automated Backups: Enable (required for PITR)

Manual Snapshots: Long-term retention (quarterly/yearly)

```

Recovery Testing:

```yaml

Quarterly: Restore snapshot to new RDS instance

Verify: Data integrity, application connectivity

Document: Restore procedure for incident response

```

---

3. Amazon Aurora Best Practices

3.1 When to Use Aurora

Aurora is a cloud-native database. 5x better performance than MySQL.

Aurora Advantages:

```yaml

☐ 5x MySQL performance, 3x PostgreSQL performance

☐ Auto-storage (up to 128 TB, no provisioning)

☐ Read scaling: Up to 15 replicas (vs. 5 for RDS)

☐ Faster failover: Typically < 30 seconds

☐ Serverless option: Scale-to-zero (cost savings)

☐ Global Database: Cross-region replication

```

Migration Justification:

```yaml

Migrate to Aurora if:

- Current RDS requires higher performance

- Manual storage management is burdensome

- Read scaling beyond 5 replicas needed

- Serverless patterns desired

```

3.2 Aurora Serverless: Cost Optimization

Aurora Serverless automatically scales compute capacity.

Use Cases:

```yaml

✓ Infrequent applications (dev/test, low-traffic prod)

✓ Unpredictable workloads (sudden spikes)

✓ Development and testing environments

```

Not Recommended For:

```yaml

✗ Production databases requiring consistent performance

✗ High connection counts (limited to 1,000 connections)

✗ Databases requiring long-running queries

```

3.3 Aurora Global Database: Multi-Region

Aurora Global Database replicates to up to 5 secondary regions.

Architecture:

```yaml

Primary Region: us-east-1 (read/write)

↓ Replication (typically < 1 second lag)

Secondary Region: us-west-2, eu-west-1 (read-only)

Failover: Promote secondary to primary (RTO: minutes)

```

Use Cases:

```yaml

☐ Global applications (low-latency reads worldwide)

☐ Disaster recovery (regional failure resilience)

☐ Data residency (local data storage requirements)

```

---

4. DynamoDB Best Practices

4.1 When to Use DynamoDB

DynamoDB is for workloads requiring unlimited scale and single-digit ms latency.

Ideal Use Cases:

```yaml

☐ User profiles, preferences, sessions

☐ Shopping carts, order histories

☐ IoT sensor data (high write throughput)

☐ Real-time bidding, gaming leaderboards

☐ Message queues, job queues

☐ Metadata stores, configuration stores

```

Anti-Patterns:

```yaml

✗ Complex joins (use RDS)

✗ Ad-hoc queries (use Redshift or RDS)

✗ Large documents (> 400 KB per item)

✗ Frequent item updates (use conditional writes)

```

4.2 DynamoDB Data Modeling

DynamoDB is NoSQL. Model data based on access patterns, not normalization.

Design Principles:

```yaml

1. Identify access patterns first (queries, reads, writes)

2. Design primary keys (partition + sort key)

3. Use GSIs (Global Secondary Indexes) for alternative access

4. Denormalize data (avoid joins, duplicate data)

5. Use composite keys for hierarchical data

```

Example: E-commerce Orders

```yaml

Primary Key:

- PK: ORDER#<order-id> (single item lookup)

- SK: METADATA#<order-id>

GSI 1 (User Orders):

- PK: USER#<user-id>

- SK: ORDER#<order-date>

GSI 2 (Product Orders):

- PK: PRODUCT#<product-id>

- SK: ORDER#<order-date>

```

4.3 DynamoDB Capacity Planning

Choose On-Demand or Provisioned capacity based on traffic patterns.

| Mode | Use Case | Cost |

|------|----------|------|

| **On-Demand** | Unknown/unpredictable traffic | 2.5x provisioned cost |

| **Provisioned** | Predictable traffic, cost optimization | Lower cost |

Provisioning Guidelines:

```yaml

RCU (Read Capacity Units):

- 1 RCU = 1 strongly consistent read (4 KB)

- 1 RCU = 2 eventually consistent reads (4 KB)

WCU (Write Capacity Units):

- 1 WCU = 1 write (1 KB)

Example: 100 reads/sec, 50 writes/sec

→ 50 RCU (eventual consistency)

→ 50 WCU (1 KB items)

```

4.4 DynamoDB Accelerator (DAX)

DAX is an in-memory cache for DynamoDB. Up to 10x read performance improvement.

When to Use DAX:

```yaml

✓ Read-heavy workloads (90%+ reads)

✓ Repeatable reads (hot data)

✓ Microsecond latency required

```

Not Recommended For:

```yaml

✗ Write-heavy workloads (no benefit)

✗ Eventually consistent reads acceptable (DAX costs money)

```

---

5. ElastiCache Best Practices

5.1 Redis vs. Memcached

Choose the right caching engine for your use case.

| Feature | Redis | Memcached |

|---------|-------|-----------|

| **Data types** | Strings, lists, sets, sorted sets, hashes | Strings only |

| **Persistence** | AOF/RDB snapshots (data survives restart) | No persistence (cache only) |

| **Replication** | Yes (high availability) | No |

| **Multi-AZ** | Yes (with replication) | No |

| **Partitioning** | Cluster enabled (sharding) | Client-side sharding |

| **Use Case** | Caching + data structures + pub/sub | Simple caching |

Recommendation:

```yaml

General Purpose → Redis (recommended default)

Simple Caching Only → Memcached (lower cost)

Pub/Sub, Locking → Redis (required)

```

5.2 Redis Cluster Mode

Redis Cluster enables horizontal scaling.

Cluster Mode Configuration:

```yaml

Cluster Mode Enabled:

- Up to 500 nodes

- Up to 500 shards (primary + replicas)

- Automatic sharding (hash slots)

- Best for: Large datasets (> 100 GB), high throughput

Cluster Mode Disabled:

- Single shard (primary + replicas)

- Simpler operations

- Best for: Smaller datasets, simpler requirements

```

5.3 ElastiCache Security

ElastiCache in VPC: Private subnets only.

Security Configuration:

```yaml

☐ VPC: Private subnets only (no public access)

☐ Security Groups: Restrict to specific EC2/IPs

☐ Encryption at rest: Enabled (Redis 6.0+)

☐ Encryption in transit: TLS (AUTH token required)

☐ AUTH Token: Require password authentication

☐ Redis version: Latest (security patches)

```

---

6. Database Migration

6.1 Migration Strategies

|------------------|----------|------------|----------|

6.2 AWS DMS (Database Migration Service)

DMS is the recommended migration tool for most scenarios.

DMS Architecture:

```yaml

Source Database (on-prem or cloud)

↓ DMS Replication Instance

Target Database (RDS, Aurora, Redshift)

Continuous Replication:

- Change Data Capture (CDC)

- Near-zero downtime (minutes of cutover)

- Schema conversion tool (SCT) for engine changes

```

Migration Process:

```yaml

1. Assess source database (AWS SCT)

2. Create target RDS instance

3. Create DMS replication instance

4. Create migration task (full load + CDC)

5. Validate data (row counts, checksums)

6. Cutover: Stop app, wait for replication lag, switch DNS

7. Decommission source after validation period

```

---

7. Monitoring and Alerting

Required Metrics

```yaml

RDS:

- CPU/Memory utilization

- Database connections (used vs. max)

- Read/write latency

- Freeable memory

- Disk queue depth (IOPS bottleneck)

DynamoDB:

- Consumed read/write capacity (vs. provisioned)

- Throttled requests (provisioning insufficient)

- System errors (5xx)

- Latency (average, p95, p99)

ElastiCache:

- CPU utilization

- Memory fragmentation ratio

- Evictions (cache insufficient size)

- Replication lag (Redis replicas)

```

Alerting

```yaml

☐ RDS CPU > 80% for 5 minutes (scale up)

☐ RDS connections > 80% of max (connection pooling needed)

☐ RDS free storage < 10 GB (storage emergency)

☐ DynamoDB throttles > 100/min (capacity issue)

☐ ElastiCache evictions > 1000/min (cache size too small)

```

---

8. Cost Optimization Checklist

Immediate Actions (Week 1)

[ ] Review RDS instance sizing (AWS Compute Optimizer)

[ ] Enable RDS reserved instances for steady-state workloads

[ ] Audit unused RDS snapshots (delete)

[ ] Review DynamoDB provisioned capacity (downsize if possible)

Short-Term (Month 1)

[ ] Migrate to Aurora Serverless for dev/test

[ ] Implement ElastiCache for read-heavy workloads

[ ] Set up automated backups (AWS Backup)

[ ] Configure DynamoDB on-demand for spiky workloads

Long-Term (Quarter 1)

[ ] Archive cold data to S3 (lower storage costs)

[ ] Implement read replicas for geographic distribution

[ ] Review and optimize expensive queries

[ ] Set up budget alerts via Cost Explorer

---

Summary: Database Excellence Pillars

1. **Right engine selection** (RDS for relational, DynamoDB for NoSQL)

2. **Multi-AZ for production** (automatic failover, HA)

3. **Read replicas for scaling** (offload read traffic)

4. **Encryption at rest** (required for compliance)

5. **Backups tested** (quarterly restore verification)

6. **Monitor performance** (CloudWatch metrics + alerting)

7. **Right-size instances** (avoid 2-3x oversizing)

8. **Use Aurora for cloud-native** (performance + auto-storage)

---

Need Help with Database Architecture?

Our database architects can design scalable, high-availability data architectures and migrate your databases with near-zero downtime.

Schedule a Free Database Consultation →

</a>

Internal Linking Strategy:

For **compute**, see [Database-Driven EC2 Sizing](/blog/aws-compute-best-practices)

For **security**, refer to [Database Encryption + IAM](/blog/aws-security-best-practices)

For **storage**, explore [EBS for Database Volumes](/blog/aws-storage-best-practices)

---

*Last updated: January 5, 2025*

AWS Database Best Practices: RDS, DynamoDB, and ElastiCache Guide

Technical TL;DR

1. Database Selection Framework

1.1 Relational vs. NoSQL Decision

1.2 RDS Engine Selection

2. Amazon RDS Best Practices

2.1 Multi-AZ Deployments: HA Requirement

2.2 Read Replicas: Scale Reads, Not Writes

2.3 RDS Instance Sizing

2.4 RDS Storage: Provisioned IOPS for Databases

2.5 RDS Encryption: Required

2.6 RDS Parameter Groups: Tuning

2.7 RDS Backups: Point-in-Time Recovery

3. Amazon Aurora Best Practices

3.1 When to Use Aurora

3.2 Aurora Serverless: Cost Optimization

3.3 Aurora Global Database: Multi-Region

4. DynamoDB Best Practices

4.1 When to Use DynamoDB

4.2 DynamoDB Data Modeling

4.3 DynamoDB Capacity Planning

4.4 DynamoDB Accelerator (DAX)

5. ElastiCache Best Practices

5.1 Redis vs. Memcached

5.2 Redis Cluster Mode

5.3 ElastiCache Security

6. Database Migration

6.1 Migration Strategies

6.2 AWS DMS (Database Migration Service)

7. Monitoring and Alerting

Required Metrics

Alerting

8. Cost Optimization Checklist

Immediate Actions (Week 1)

Short-Term (Month 1)

Long-Term (Quarter 1)

Summary: Database Excellence Pillars

Need Help with Database Architecture?

Need Help with Your AWS Infrastructure?

Related Articles

AWS AI/ML Best Practices: SageMaker and Bedrock Architecture Guide

AWS Compute Best Practices: The Complete Well-Architected Guide

AWS Networking Best Practices: VPC, CloudFront, and Connectivity Guide