#
types
RDBMS (= SQL / OLTP): RDS, [[#Aurora]] – great for joins
NoSQL database – no joins, no SQL : [[#DynamoDB]] (~JSON), ElastiCache
(key / value pairs), [[#Neptune]] (graphs), [[#DocumentDB]] (for MongoDB), Keyspaces
(for Apache Cassandra)
Object Store: [[#S3]] (for big objects) / Glacier (for backups / archives)
Data Warehouse (= SQL Analytics / BI): Redshift (OLAP), Athena, EMR
Search: OpenSearch (JSON) – free text, unstructured searches
Graphs: Amazon Neptune – displays relationships between data
Ledger : Amazon Quantum Ledger Database
Time series: Amazon Timestream
#
RDS
Managed PostgreSQL / MySQL / Oracle / SQL Server / MariaDB / Custom
Provisioned RDS Instance Size and EBS Volume Type & Size
Auto-scaling capability for Storage
underlines maybe not enabled by default
Support for Read Replicas and Multi AZ
Security through IAM, Security Groups, KMS , SSL in transit
Automated Backup with Point in time restore feature (up to 35 days)
Manual DB Snapshot for longer-term recovery
Managed and Scheduled maintenance (with downtime)
Support for IAM Authentication, integration with Secrets Manager
RDS Custom for access to and customize the underlying instance (Oracle & SQL Server)
Use case: Store relational datasets (RDBMS / OLTP), perform SQL queries, transactions
#
Aurora
Compatible API for PostgreSQL/MySQL, separation of storage and compute
Storage: data is stored in 6 replicas, across 3AZ – highly available, self-healing, auto-scaling
Compute: Cluster of DB Instance across multiple AZ, auto-scaling of ReadReplicas
Cluster: Custom endpoints for writer and readerDB instances
Same security / monitoring / maintenance features as RDS
Security through IAM, Security Groups, KMS , SSL in transit
Know the backup & restore options for Aurora
Aurora Serverless – for unpredictable / intermittent workloads, no capacity planning
Aurora Global: up to 16 DB Read Instances in each region, < 1 second storage replication
Aurora Machine Learning: perform ML using SageMaker & Comprehend on Aurora
Aurora Database Cloning:new cluster from existing one, faster than restoring a snapshot
Use case: same as RDS, but with less maintenance / more flexibility / more performance / more features
endpoints
cluster - primary
reader
custom
instance
#
ElastiCache
Managed Redis / Memcached
(similar offering as RDS, but for caches)
In-memory data store, sub-millisecond latency
Select an ElastiCache
instance type (e.g., cache.m6g.large)
Support for Clustering (Redis) and Multi AZ, Read Replicas (sharding)
Security through IAM, Security Groups, KMS, Redis Auth
Backup / Snapshot / Point in time restore feature
Managed and Scheduled maintenance
Requires some application code changes to be leveraged
Use Case: Key/Value store, Frequent reads, less writes, cache results for DB queries, store session data for websites, cannot use SQL.
#
DynamoDB
AWS proprietary technology, managed serverless NoSQL database, millisecond latency
Capacity modes: provisioned capacity with optional auto-scaling or on-demand capacity
Can replace ElastiCache
as a key/value store (storing session data for example, using TTL feature)
Highly Available, Multi-AZ by default, Read and Writes are decoupled, transaction capability
DAX cluster for read cache, microsecond read latency
Security, authentication and authorization is done through IAM
Event Processing: DynamoDB Streams to integrate with AWS Lambda, or Kinesis Data Streams
Global Table feature: active-active setup
Automated backups up to 35 days with PITR (restore to new table), or on-demand backups
Export to S3 without using RCU within the PITR window, import from S3 without using WCU
Great to rapidly evolve schemas
Use Case: Serverless applications development (small documents 100s KB), distributed serverless cache
#
S3
S3 is a… key / value store for objects
Great for bigger objects, not so great for many small objects
Serverless, scales infinitely, max object size is 5 TB, versioning capability
Tiers: S3 Standard, S3 Infrequent Access, S3 Intelligent, S3 Glacier + lifecycle policy
Features: Versioning, Encryption, Replication, MFA-Delete, Access Logs… [[aws-big-data#Athena|Athena]]
Security: IAM, Bucket Policies, ACL, Access Points, Object Lambda, CORS, Object/Vault Lock
Encryption: SSE-S3, SSE-KMS, SSE-C, client-side,TLS in transit, default encryption
Batch operations on objects using S3 Batch, listing files using S3 Inventory
Performance: Multi-part upload, S3 Transfer Acceleration, S3Select
Automation: S3 Event Notifications (SNS, SQS, Lambda, EventBridge)
Use Cases: static files, key value store for big files, website hosting
#
DocumentDB
DocumentDB is the same for MongoDB (which is a NoSQL database)
MongoDB is used to store, query, and index JSON data
Similar “deployment concepts” as Aurora
Fully Managed, highly available with replication across 3 AZ
DocumentDB storage automatically grows in increments of 10GB
Automatically scales to workloads with millions of requests per seconds
#
Neptune
Fully managed graph
database
A popular graph dataset would be a social network
Users have friends
Posts have comments
Comments have likes from users
Users share and like posts…
Highly available across 3 AZ, with up to 15 read replicas
Build and run applications working with highly connected datasets – optimized for these complex and hard queries
Can store up to billions of relations and query the graph with milliseconds latency
Highly available with replications across multiple AZ
Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking
#
Keyspaces
A managed Apache Cassandra-compatible database service
Serverless, Scalable, highly available, fully managed by AWS
Automatically scale tables up/down based on the application’s traffic
Tables are replicated 3 times across multiple AZ
Using the Cassandra Query Language (CQL)
Single-digit millisecond latency at any scale, 1000s of requests per second
Capacity: On-demand mode or provisioned mode with auto-scaling
Encryption, backup, Point-In-Time Recovery (PITR) up to 35 days
Use cases: store IoT devices info, time-series data, …
#
QLDB
QLDB stands for ”Quantum Ledger Database”
Fully Managed, Serverless, High available, Replication across 3AZ
Used to review history of all the changes made to your application data over time
Immutable system: no entry can be removed or modified, cryptographically verifiable
2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL
Difference with Amazon Managed Blockchain: no decentralization component, in accordance with financial regulation rules
#
Timestream
Fully managed, fast, scalable, serverless time series database
Automatically scales up/down to adjust capacity
Store and analyze trillions of events per day
1000s times faster & 1/10th the cost of relational databases
Scheduled queries, multi-measure records, SQL compatibility
Data storage tiering: recent data kept in memory and historical data kept in a cost-optimized storage
Built-in time series analytics functions (helps you identify patterns in your data in near real-time)
Encryption in transit and at rest
Use cases: IoT apps, operational applications, real-time analytics, …
#
Quick Catchup
RDS events only provide operational events such as DB instance events, DB parameter group events, DB security group events, and DB snapshot events. What we need in the scenario is to capture data-modifying events (INSERT
, DELETE
, UPDATE
) which can be achieved through native functions or stored procedures.
Amazon RDS provides metrics in real time for the operating system (OS) that your DB instance runs on. You can view the metrics for your DB instance using the console, or consume the Enhanced Monitoring JSON output from CloudWatch Logs in a monitoring system of your choice. By default, Enhanced Monitoring metrics are stored in the CloudWatch Logs for 30 days. To modify the amount of time the metrics are stored in the CloudWatch Logs, change the retention for the RDSOSMetrics
log group in the CloudWatch console.
DynamoDB
use partition keys with high-cardinality attributes, which have a large number of distinct values for each item
Licensed under CC BY-NC-SA 4.0