AWS Monitor

AWS Monitoring

CloudWatch

metrics

  • CloudWatch provides metrics for every services in AWS
  • Metric is a variable to monitor (CPUUtilization, NetworkIn…)
  • Metrics belong to namespaces
  • Dimension is an attribute of a metric (instance id, environment, etc…)
  • Up to 30 dimensions per metric
  • Metrics have timestamps
  • Can create CloudWatch dashboards of metrics
  • Can create CloudWatch Custom Metrics

metric streams

  • continually stream cloud-watch metrics to a destination of your choice, with near-real-time delivery and low latency
  • option to filter metrics to only stream a subset of them

logs

  • Log groups: arbitrary name, usually representing an application
  • Log stream: instances within application / log files / containers
  • Can define log expiration policies (never expire, 1 day to 10 years…)
  • CloudWatch Logs can send logs to:
    • Amazon S3 (exports)
    • Kinesis Data Streams
    • Kinesis Data Firehose
    • AWS Lambda
    • OpenSearch
  • Logs are encrypted by default
  • Can setup KMS-based encryption with your own keys

sources

  • SDK, cloud-watch logs agent, cloud-watch unified agent
  • Elastic beanstalk
  • ECS
  • AWS lambda
  • VPC flow logs
  • API gateway
  • CloudTrail based on filter
  • Route53 - log dns queries

insights

  • search and analyze log data stored in cloud-watch logs
  • provides a purpose-built query language
  • can query multiple log groups in different AWS accounts
  • it’s a query engine, not a real-time engine

S3 export

  • log data can take up to 12 hours to become available for export
  • CreateExportTask
  • not near-real time or realtime, use logs subscription instead

logs subscriptions

  • get a realtime log events from cloud-watch logs for processing and analysis
  • send to kinesis data streams, kinesis data firehose, lambda
  • subscription filter
  • multi-account, multi-region aggregation
  • cross-account subscription

EC2

  • need to run a cloud-watch agent to collect logs
  • IAM permissions
  • agent can be setup on-premises

agents

  • cloud-watch logs agent
    • only send logs
  • cloud-watch unified agent
    • collect additional system-level metrics, RAM, processes
    • logs
    • centralized configuration using SSM parameter store
    • metrics
      • CPU
      • Disk metrics
      • RAM
      • Netstat
      • Processes
      • Swap Space

alarm

  • to trigger notifications for any metric (sampling, %, max, min)
  • alarm states
    • ok
    • insufficient_data
    • alarm
  • period
    • length of time in seconds to evaluate the metric
    • high resolution custom metrics - 10s, 30s or multiple of 60s
  • targets
    • stop, terminate, reboot, recover EC2
    • trigger auto scaling
    • send notification to SNS
  • composite alarms
    • composite alarms are monitoring the states of multiple other alarms
    • AND / OR
    • to reduce “alarm noise” by creating complex composite alarms

CW insights

  • CloudWatch Container Insights
    • collect, aggregate, summarize metrics and logs from containers
    • ECS, EKS, Kubernetes on EC2, Fargate, needs agent for Kubernetes
    • Metrics and logs
  • CloudWatch Lambda Insights - Detailed metrics to troubleshoot serverless applications
  • CloudWatch Contributors Insights - Find “Top-N” Contributors through CloudWatch Logs
  • CloudWatch Application Insights - Automatic dashboard to troubleshoot your application and related AWS services

EventBridge

  • schedule - cron jobs (lambda)
  • event pattern - event rules to react to a service doing something
  • trigger lambda functions, send SQS/SNS messages
  • event buses can be accessed by other AWS accounts using resource-based policies
  • archive events (all/filter) sent to an event bus (indefinitely or set period)
  • ability to replay archived events
  • schema registry
    • event-bridge can analyze the events in your bus and infer the schema
    • allows you to generate code for your application
    • can be versioned
  • resource-based policy
    • manage permissions for a specific event-bus
    • example - allow/deny events from another AWS account or Region
    • use case - aggregate all events from your AWS organization in a single AWS account or AWS Region
  • security
    • when a rule runs, it needs permissions on the target
    • resource-based policy
      • lambda
      • SNS
      • SQS
      • CloudWatch Logs
      • API Gateway
    • IAM role
      • Kinesis Stream
      • System Manager Run Command
      • ECS task

CloudTrail

  • Provides governance, compliance and audit for your AWS Account
  • CloudTrail is enabled by default!
  • Get an history of events / API calls made within your AWS Account
    • Console
    • SDK
    • CLI
    • AWS Services (IAM Users/Roles)
  • Can put logs from CloudTrail into CloudWatch Logs or S3
  • A trail can be applied to All Regions (default) or a single Region
  • If a resource is deleted in AWS, investigate CloudTrail first!
  • events
    • management events
      • operations that are performed on resources in your AWS account
      • examples
        • configure security
        • configure rules for routing
        • setting up logging
      • by default, trails are configured to log management events
      • can separate read / write
    • data events
      • by default, not logged (high volume operations)
      • Amazon s3 object-level activity (R/W)
      • lambda execution activity (Invoke API)
    • insights events - to detect unusual activity in your accounts
      • examples
        • inaccurate resource provisioning
        • hitting service limits
        • bursts of AWS IAM actions
        • gaps in periodic maintenance activity
      • analyze normal management events to create a baseline
      • then, continuously analyzes write events to detect unusual patterns
        • anomalies appear in the cloud-trail console
        • event is sent to amazon s3
        • an event-bridge event is generated (for automation needs)
    • events retention
      • events are stored for 90 days
      • to keep even longer, to s3 and use Athena
  • intercept API calls

AWS Config

  • Helps with auditing and recording compliance of your AWS resources
  • Helps record configurations and changes over time
  • Questions that can be solved by AWS Config
    • Is there unrestricted SSH access to my security groups?
    • Do my buckets have any public access?
    • How has my ALB configuration changed over time?
  • You can receive alerts (SNS notifications) for any changes
  • AWS Config is a per-region service
  • Can be aggregated across regions and accounts
  • Possibility of storing the configuration data into S3 (analyzed by Athena)
  • config rules
    • AWS managed config rules (over 75)
    • custom config rules (must be defined in AWS lambda)
    • AWS config rules doesn’t prevent actions from happening (no-deny)
    • remediations
      • automate remediation of non-compliant resources using SSM automation documents
      • Use AWS-Managed Automation Documents or create custom Automation Documents
      • You can set Remediation Retries if the resource is still non-compliant after auto-remediation
    • notifications
      • Use EventBridge to trigger notifications when AWS resources are non-compliant
      • Ability to send configuration changes and compliance state notifications to SNS (all events – use SNS Filtering or filter at client-side)

Comparison

  • CloudWatch
    • Monitoring Incoming connections as metric
    • Visualize error codes as % over time
    • Make a dashboard to get an idea of your load balancer performance
  • Config
    • Track security group rules for the Load Balancer
    • Track configuration changes for the Load Balancer
    • Ensure an SSL certificate is always assigned to the Load Balancer (compliance)
  • CloudTrail
    • Track who made any changes to the Load Balancer with API calls

ELB

  • CloudWatch
    • Monitoring Incoming connections metric
    • Visualize error codes as % over time
    • Make a dashboard to get an idea of your load balancer performance
  • Config
    • Track security group rules for the Load Balancer
    • Track configuration changes for the Load Balancer
    • Ensure an SSL certificate is always assigned to the Load Balancer (compliance)
  • CloudTrail
    • Track who made any changes to the Load Balancer with API calls
Licensed under CC BY-NC-SA 4.0
Get Things Done
Built with Hugo
Theme Stack designed by Jimmy