S3

Amazon S3

Use Cases

  • backup and storage
  • disaster recovery
    • EBS
    • RDS
  • archive
  • hybrid cloud storage
  • application hosting
  • media hosting
  • data lakes & big data analytics
  • software delivery
  • static website
    • http://bucket-name.s3-website-aws-region.amazonaws.com

Buckets

  • buckets must have a globally unique name (across all regions all accounts)
    • s3://unique-bucket-name/xxx
  • buckets are defined at the region level
  • Naming convention
    • No uppercase, No underscore
    • 3-63 characters long
    • Not an IP
    • Must start with lowercase letter or number
    • Must NOT start with the prefix xn--
    • Must NOT end with the suffix -s3alias

Objects

  • Objects have a KEY, key is the full path
  • the key is composed of prefix + object name
    • s3://my-bucket/my_folder/my_file.txt
  • there’s no concept of “directory” within buckets
  • Object values are the content of the body
    • max size is 5TB
    • if uploading more than 5GB, must use “multi-part” upload
  • Metadata
  • Tags - security / lifecycle
  • VersionID (if versioning is enabled)

Security

  • User Based
    • IAM policies - which API calls should be allowed for a specific user from IAM
  • Resource Based
    • Bucket Policies - bucket wide rules from s3 console - allows cross account
    • Object ACL - finer grain
    • Bucket ACL
  • Encryption
  • bucket settings for block public access

Bucket Policies

  • anonymous - s3 bucket policy
  • IAM user - IAM Policy
  • EC2 - Instance Role - IAM Permissions
  • cross-account access - s3 bucket policy

AWS Policy Generator

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
{
  "Id": "Policy1704947111194",
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt1704947108663",
      "Action": ["s3:GetObject"],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::iris-service-s3-bucket/test.jpg",
      "Principal": "*"
    }
  ]
}

Versioning

  • version files in s3
  • bucket level setting
  • same key overwrite will change the “version”: 1,2,3
  • notes
    • versioning protects against unintended deletes, easy roll back to previous version
    • any file that is not versioned prior to enabling versioning will have version null
    • suspend versioning does not delete the previous versions

Replication

  • must enable versioning in source and destination buckets
  • Cross Region Replication
  • Same Region Replication
  • Buckets can be in different AWS account
  • Copying is ASYNC
  • Must give proper IAM permissions to s3
  • notes
    • only new objects are replicated after enable replication
    • optionally use s3 batch replication to replicate existed objects
    • can replicate delete markers from source to target (optional setting)
    • deletions with a version ID are not replicated
    • if bucket 1 has replication into bucket 2, which has replication into bucket 3. then objects created in bucket 1 are not replicated to bucket 3 no chaining of replication

Storage Class

  • General
    • General Purpose
    • Intelligent Tiering
    • Express One Zone
  • Infrequent Access
    • Standard
    • One Zone
  • Glacier
    • pricing: storage + object retrieval cost
    • Glacier - Instant Retrieval
      • minimum storage duration of 90 days
    • Glacier - Flexible Retrieval
      • expedited (15 minutes), standard (35 hours), bulk (5~12 hours)
      • minimum storage duration of 90 days
    • Glacier - Deep Archive
      • standard (12 hours), bulk (48 hours)
      • minimum storage duration of 180 days

Storage Class Life Cycle

  • transition
    • move objects to standard-IA 60 days after creation
    • move to glacier for archiving after 6 months
  • expiration
    • to delete old versions of files
    • to delete incomplete multi-part uploads
  • rules
    • based on prefix
    • based on object tags

Other Features

  • storage class analysis
    • help you decide when to transition objects to the right storage class
    • only for standard, standard IA
    • report is updated daily
    • 24 - 48 hours to start seeing data analysis
  • Requester Pay Buckets (owner pays storage cost)
    • the requester must be authenticated in AWS

S3 Event Notification

  • S3:ObjectCreated, S3:ObjectRemoved, S3:ObjectRestore, S3:Replication
  • Object name filtering possible
  • Grant IAM Permissions
    • SNS, SQS, Lambda Function
    • resource access policy
  • with Amazon EventBridge
    • advanced filtering options with JSON rules
    • multiple destination - step function, kinesis streams / firehose
    • event-bridge capabilities
      • archive
      • replay events
      • reliable delivery
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
// configure sqs to allow send message
{
  "Id": "Policy1704960814568",
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt1704960813632",
      "Action": ["sqs:SendMessage"],
      "Effect": "Allow",
      "Resource": "arn:aws:sqs:ap-northeast-1:224071036262:MyQueue",
      "Principal": {
        "Service": "s3.amazonaws.com"
      },
      "Condition": {
        "ArnLike": {
          "aws:SourceArn": "arn:aws:s3:::MyBucket"
        }
      }
    }
  ]
}

Baseline Performance

  • Amazon S3 automatically scales to high request rates, latency 100-200 ms
  • Your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per prefix in a bucket.
  • There are no limits to the number of prefixes in a bucket.
  • multi-part upload
    • recommend for >100MB
    • must for > 5GB
  • S3 Transfer Acceleration (Edge Location)
  • S3 byte range fetches
    • to speed up downloads
    • to retrieve only a partial data (head of a file)
  • S3 Select & Glacier Select
    • through bucket_name & key
    • retrieve less data using SQL by performing server-side filtering
    • filter by rows and columns (simple SQL statements)
  • S3 Batch Operations

S3 Security

Object Encryption

  • Server-Side Encryption
    • SSE-S3 (amazon s3 managed keys) default for new buckets, objects
      • AES-256
      • http request header x-amz-server-side-encryption: AES256
    • SSE-KMS (amazon KMS)
      • KMS advantages
        • user control
        • audit key usage using CloudTrail
      • might be impacted by the KMS limits
        • GenerateDataKey KSM API
        • Decrypt KMS API
      • http request header x-amz-server-side-encryption: awskms
    • SSE-C (customer provided keys)
      • must be HTTPS
      • encryption key must provided in HTTP headers
  • Client-Side Encryption
  • Encryption in transit (SSL/TLS) aws:SecureTransport
  • cross origin resource sharing
    • allow for a specific origin or for * (all origins)
  • MFA Delete
    • Versioning must be enabled
    • permanently delete an object version
    • suspend versioning on the bucket
    • only the bucket owner can enable/disable MFA
  • Access Logs
    • any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket
    • target bucket must be in the same Region
    • Avoid bucket loop
  • Pre-Signed URLs
    • expiration
    • permission inherit
  • S3 Glacier Valut Lock
    • WORM - Write Once read many
    • lock the policy for future edits
    • helpful for compliance and data retention
  • Object Lock
    • must enable versioning
    • WORM
    • block an object version deletion for a specified amount of time
    • retention mode - compliance
      • object versions can’t be overwritten or deleted by any user, including the root user
      • objects retention modes can’t be changed, and retention periods can’t be shortened
    • retention mode - governance
      • most users can’t overwrite or delete an object version or alter its lock settings
      • some users have special permissions to change the retention or delete the object
    • retention period - protect the object for a fixed period, can be extended
    • legal hold
      • protect the object indefinitely, independently from retention period
      • can be freely placed and removed using the S3:PutObjectLegalHold IAM permission
  • Access Points
    • users (finance) -> finance access point (policy grant R/W to /finance prefix) -> S3 bucket
    • own DNS name (internet origin or VPC origin)
    • own access point policy - manage security at scale
    • VPC Origin
      • only accessible from within the VPC
      • must create VPC endpoint (gateway or interface) to access the Access Point
      • the VPC endpoint policy must allow access to the target bucket and access point
  • Object Lambda
    • to change the object before it is retrieved by the caller application

IAM

  • s3:ListBucket permission applies to bucket level
  • s3:PutObject applies to object level
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": ["arn:aws:s3:::azusachino-secret-s3"]
    },
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
      "Resource": ["arn:aws:s3:::azusachino-secret-s3/*"]
    }
  ]
}

Quick Catchup

  • Explicit DENY in an IAM Policy will take precedence over an S3 bucket policy.
  • You pay for all bandwidth into and out of S3, except for
    • data transferred in from the internet
    • data transferred out to EC2 instance, when the instance is in the same Region
    • data transferred out to CloudFront
  • replicating the objects to the destination bucket takes about 15 minutes

References

Licensed under CC BY-NC-SA 4.0
Get Things Done
Built with Hugo
Theme Stack designed by Jimmy