Here are my notes on Amazon's RDS, Aurora, and ElastiCache services at the Solutions Architect - Associate Level. We'll cover everything from the basics of RDS and its read replicas, to Amazon Aurora's serverless and global variants, to the different caching patterns in ElastiCache. So, let's get started!
RDS
What: Managed SQL, managed backups & restore, automated provisioning & OS patching
Supports 5 engines: MariaDB, MySQL, PostgreSQL, SQL Server, Oracle
Features:
- Continuous backup and Point-in-time (incremental) restore
- Choose backup window
- Auto-scaling
- Storage autoscaling occurs when one of two things happens:
- Max storage threshold reached
- Or when free storage is < 10%
- Or low storage for 5 mins or 6 hours since last scaling
- RDS does not support SSH into the instances by default
RDS Read replicas
Read replicas:
- Up to 15
- Within-AZ, cross-AZ, cross-region (with network costs)
- Eventually consistent and asynchronous replication
- Can be promoted to own DB
Use case → To scale when read load is high (select statements)
Cost → same region, no network cost. cross region, incur $$$
Note: often it is better to use ElastiCache for scaling up reads, but read replica ensures that underlying DB remains available if there is AZ/regional failures
Note: there is network cost for cross-region replication only
RDE Multi-AZ
Multi AZ:
- Synchronous replication (for every writes)
- Zero downtime to setup, as it utilizes snapshot to copy
- Can setup read replicas as Multi-AZ
Use case → mainly for disaster recovery, via automated failover by automatic promotion to master
RDS Custom
RDS custom
- RDS, but with access to underlying OS and DB customisation
- Supports only Oracle & Microsoft SQL Server DB
Use case: configure settings, patches, native features, and SSH to underlying instances
Note: Deactivate automation and snapshot DB mode when performing customisation
Amazon Aurora
What: Cloud-native PostgreSQL/MySQL compatible, up to 5x performance over mysql on RDS and 3x over Postgre on RDS
Features
- Automatic storage autoscaling → increments of 10gb, up to 128 TB
- Up to 15 cross-region read replicas, faster than mysql replication (sub 10ms replica lag). Supports autoscaling
- Automatic failover is instant (< 30s)
- 0 downtime automated patching & routine maintenance
- Isolation & Security
- Industry compliance
- Backtrack → restore data to any point in the past without backups
Integration → Integrated securely with ML service (SageMaker & Comprehend) for ML-based predictions
How:
- 6 copies of data across 3 AZs for self-healing via peer to peer
- 4/6 copies supporting writes, 3/6 copies supporting reads
Endpoints → writer endpoint (for master), reader endpoint (for read replicas)
Amazon Aurora - Autoscaling
- Custom endpoint → define custom endpoint for a subset of the replicas (use different type of instance for better performance such as analytical)
Amazon Aurora - Serverless
Aurora serverless:
- Aurora, but pay per second and autoscaling based on actual usage
Use case: intermittent or unpredictable workloads
Amazon Aurora - Global
Aurora global:
- Alternative to cross region replicas.
- 1 primary region, and 5 secondary read only regions (16 replica per region) with less than 1s replication lag
RDS & Aurora Backups & Snapshots
Automated backups
- During backup window, full backup is conducted daily
- Transaction logs are used for backup every 5 minutes
- Retention of automated backups are 1 - 35 days
Manual snapshots
- Retention → As long as you want
- Snapshots are a cheap way to stop RDS instance when not in use, as it is cheaper than running idle RDS cluster
Copy-on-write:
- Using the copy-on-write proctocol, we can also restore DB faster than snapshot and restore
- Use case: copy prod db to staging/dev environments
RDS & Aurora Restores for on-prem
Restores for RDS and Aurora always creates new database
- S3 → MySQL RDS: Upload on-prem DB Backup to S3, then restore
- S3 → MySQL Aurora: Use Percona XtraBackup to backup you on-prem to S3. then load it to Aurora
RDS & Aurora Proxy
What: allow pooling connection via a proxy (reduce open connections, improve resource util). Typically used by lambda
- Managed feature, serverless, high-availability
- Reduce failover time up to 66%
- Enforce IAM authentication and enforce access only by VPC
RDS & Aurora Security
At rest encryption
- KMS encrypted master & replicas, defined at launch time.
- Read encryption-ability depends on master
- To encrypt, need to snapshot → restore
Inflight
- TLS by default
Access via IAM roles, control Security groups D
Audit logs can be enabled and sent to cloudwatch
ElastiCache
What: managed Redis or Elasticache
Use case: user session store / increase DB scalability /latency
Caching patterns
- Lazy loading - read data is proactively stored in L2
- Write through - updates L2 when db is written
- Session store - session data with TTL
ElastiCache (Memcached)
- Multi-node (sharding) but no replication
- Non persistent, no backup and restore
- Multi-threaded architecture
- Authentication: SASL based (advanced)
- Use case: afford to lose data, and high throughput
ElastiCache (Redis)
- Multi-az w/ failover
- Read replicas
- Persistant data backup and restore
- Supports sets & sorted set
- Authentication: IAM support
- Redis also have a Redis AUTH feature where users need to enter a password before conducting redis command.
- Redis also supports the
transit-encryption-enabled
feature to ensure TLS/SSL communcation between redis nodes
- Use case: high availability persistant L2 cache