Amazon S3
Use Cases
- backup and storage
- disaster recovery
- EBS
- RDS
- archive
- hybrid cloud storage
- application hosting
- media hosting
- data lakes & big data analytics
- software delivery
- static website
http://bucket-name.s3-website-aws-region.amazonaws.com
Buckets
- buckets must have a globally unique name (across all regions all accounts)
s3://unique-bucket-name/xxx
- buckets are defined at the region level
- Naming convention
- No uppercase, No underscore
- 3-63 characters long
- Not an IP
- Must start with lowercase letter or number
- Must NOT start with the prefix
xn--
- Must NOT end with the suffix
-s3alias
Objects
- Objects have a KEY, key is the full path
- the key is composed of
prefix
+object name
s3://my-bucket/my_folder/my_file.txt
- there’s no concept of “directory” within buckets
- Object values are the content of the body
- max size is 5TB
- if uploading more than 5GB, must use “multi-part” upload
- Metadata
- Tags - security / lifecycle
- VersionID (if versioning is enabled)
Security
- User Based
- IAM policies - which API calls should be allowed for a specific user from IAM
- Resource Based
- Bucket Policies - bucket wide rules from s3 console - allows cross account
- Object ACL - finer grain
- Bucket ACL
- Encryption
- bucket settings for block public access
Bucket Policies
- anonymous - s3 bucket policy
- IAM user - IAM Policy
- EC2 - Instance Role - IAM Permissions
- cross-account access - s3 bucket policy
|
|
Versioning
- version files in s3
bucket level
setting- same key overwrite will change the “version”: 1,2,3
- notes
- versioning protects against unintended deletes, easy roll back to previous version
- any file that is not versioned prior to enabling versioning will have version
null
- suspend versioning does not delete the previous versions
Replication
- must enable versioning in source and destination buckets
- Cross Region Replication
- Same Region Replication
- Buckets can be in different AWS account
- Copying is ASYNC
- Must give proper IAM permissions to s3
- notes
- only new objects are replicated after enable replication
- optionally use s3 batch replication to replicate existed objects
- can replicate
delete
markers from source to target (optional setting) - deletions with a version ID are not replicated
- if bucket 1 has replication into bucket 2, which has replication into bucket 3. then objects created in bucket 1 are not replicated to bucket 3
no chaining of replication
Storage Class
- General
- General Purpose
- Intelligent Tiering
- Express One Zone
- Infrequent Access
- Standard
- One Zone
- Glacier
- pricing: storage + object retrieval cost
- Glacier - Instant Retrieval
- minimum storage duration of 90 days
- Glacier - Flexible Retrieval
- expedited (1
5 minutes), standard (35 hours), bulk (5~12 hours) - minimum storage duration of 90 days
- expedited (1
- Glacier - Deep Archive
- standard (12 hours), bulk (48 hours)
- minimum storage duration of 180 days
Storage Class Life Cycle
- transition
- move objects to standard-IA 60 days after creation
- move to glacier for archiving after 6 months
- expiration
- to delete old versions of files
- to delete incomplete multi-part uploads
- rules
- based on prefix
- based on object tags
Other Features
- storage class analysis
- help you decide when to transition objects to the right storage class
- only for standard, standard IA
- report is updated daily
- 24 - 48 hours to start seeing data analysis
- Requester Pay Buckets (owner pays storage cost)
- the requester must be authenticated in AWS
S3 Event Notification
- S3:ObjectCreated, S3:ObjectRemoved, S3:ObjectRestore, S3:Replication
- Object name filtering possible
- Grant IAM Permissions
- SNS, SQS, Lambda Function
- resource access policy
- with Amazon EventBridge
- advanced filtering options with JSON rules
- multiple destination - step function, kinesis streams / firehose
- event-bridge capabilities
- archive
- replay events
- reliable delivery
|
|
Baseline Performance
- Amazon S3 automatically scales to high request rates, latency 100-200 ms
- Your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket.
- There are no limits to the number of prefixes in a bucket.
- multi-part upload
- recommend for >100MB
- must for > 5GB
- S3 Transfer Acceleration (Edge Location)
- S3 byte range fetches
- to speed up downloads
- to retrieve only a partial data (head of a file)
- S3 Select & Glacier Select
- through bucket_name & key
- retrieve less data using SQL by performing server-side filtering
- filter by rows and columns (simple SQL statements)
- S3 Batch Operations
S3 Security
Object Encryption
- Server-Side Encryption
- SSE-S3 (amazon s3 managed keys) default for new buckets, objects
- AES-256
- http request header
x-amz-server-side-encryption: AES256
- SSE-KMS (amazon KMS)
- KMS advantages
- user control
- audit key usage using CloudTrail
- might be impacted by the KMS limits
GenerateDataKey
KSM APIDecrypt
KMS API
- http request header
x-amz-server-side-encryption: awskms
- KMS advantages
- SSE-C (customer provided keys)
- must be HTTPS
- encryption key must provided in HTTP headers
- SSE-S3 (amazon s3 managed keys) default for new buckets, objects
- Client-Side Encryption
- Encryption in transit (SSL/TLS)
aws:SecureTransport
- cross origin resource sharing
- allow for a specific origin or for * (all origins)
- MFA Delete
- Versioning must be enabled
- permanently delete an object version
- suspend versioning on the bucket
- only the bucket owner can enable/disable MFA
- Access Logs
- any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket
- target bucket must be in the same Region
- Avoid bucket loop
- Pre-Signed URLs
- expiration
- permission inherit
- S3 Glacier Valut Lock
- WORM - Write Once read many
- lock the policy for future edits
- helpful for compliance and data retention
- Object Lock
- must enable versioning
- WORM
- block an object version deletion for a specified amount of time
- retention mode - compliance
- object versions can’t be overwritten or deleted by any user, including the root user
- objects retention modes can’t be changed, and retention periods can’t be shortened
- retention mode - governance
- most users can’t overwrite or delete an object version or alter its lock settings
- some users have special permissions to change the retention or delete the object
- retention period - protect the object for a fixed period, can be extended
- legal hold
- protect the object indefinitely, independently from retention period
- can be freely placed and removed using the
S3:PutObjectLegalHold
IAM permission
- Access Points
- users (finance) -> finance access point (policy grant R/W to
/finance
prefix) -> S3 bucket - own DNS name (internet origin or VPC origin)
- own access point policy - manage security at scale
- VPC Origin
- only accessible from within the VPC
- must create VPC endpoint (gateway or interface) to access the Access Point
- the VPC endpoint policy must allow access to the target bucket and access point
- users (finance) -> finance access point (policy grant R/W to
- Object Lambda
- to change the object before it is retrieved by the caller application
IAM
s3:ListBucket
permission applies to bucket levels3:PutObject
applies to object level
|
|
Quick Catchup
- Explicit DENY in an IAM Policy will take precedence over an S3 bucket policy.
- You pay for all bandwidth into and out of S3, except for
- data transferred in from the internet
- data transferred out to EC2 instance, when the instance is in the same Region
- data transferred out to CloudFront
- replicating the objects to the destination bucket takes about 15 minutes