AWS Monitoring
CloudWatch
metrics
- CloudWatch provides metrics for every services in AWS
- Metric is a variable to monitor (CPUUtilization, NetworkIn…)
- Metrics belong to namespaces
- Dimension is an attribute of a metric (instance id, environment, etc…)
- Up to 30 dimensions per metric
- Metrics have timestamps
- Can create CloudWatch dashboards of metrics
- Can create CloudWatch Custom Metrics
metric streams
- continually stream cloud-watch metrics to a destination of your choice, with near-real-time delivery and low latency
- option to filter metrics to only stream a subset of them
logs
- Log groups: arbitrary name, usually representing an application
- Log stream: instances within application / log files / containers
- Can define log expiration policies (never expire, 1 day to 10 years…)
- CloudWatch Logs can send logs to:
- Amazon S3 (exports)
- Kinesis Data Streams
- Kinesis Data Firehose
- AWS Lambda
- OpenSearch
- Logs are encrypted by default
- Can setup KMS-based encryption with your own keys
sources
- SDK, cloud-watch logs agent, cloud-watch unified agent
- Elastic beanstalk
- ECS
- AWS lambda
- VPC flow logs
- API gateway
- CloudTrail based on filter
- Route53 - log dns queries
insights
- search and analyze log data stored in cloud-watch logs
- provides a purpose-built query language
- can query multiple log groups in different AWS accounts
- it’s a query engine, not a real-time engine
S3 export
- log data can take up to 12 hours to become available for export
CreateExportTask
- not near-real time or realtime, use
logs subscription
instead
logs subscriptions
- get a realtime log events from cloud-watch logs for processing and analysis
- send to kinesis data streams, kinesis data firehose, lambda
subscription filter
- multi-account, multi-region aggregation
cross-account subscription
EC2
- need to run a cloud-watch agent to collect logs
- IAM permissions
- agent can be setup on-premises
agents
- cloud-watch logs agent
- only send logs
- cloud-watch unified agent
- collect additional system-level metrics, RAM, processes
- logs
- centralized configuration using SSM parameter store
- metrics
- CPU
- Disk metrics
- RAM
- Netstat
- Processes
- Swap Space
alarm
- to trigger notifications for any metric (sampling, %, max, min)
- alarm states
- ok
- insufficient_data
- alarm
- period
- length of time in seconds to evaluate the metric
- high resolution custom metrics - 10s, 30s or multiple of 60s
- targets
- stop, terminate, reboot, recover EC2
- trigger auto scaling
- send notification to SNS
- composite alarms
- composite alarms are monitoring the states of multiple other alarms
- AND / OR
- to reduce “alarm noise” by creating complex composite alarms
CW insights
- CloudWatch Container Insights
- collect, aggregate, summarize metrics and logs from containers
- ECS, EKS, Kubernetes on EC2,
Fargate
, needs agent for Kubernetes - Metrics and logs
- CloudWatch Lambda Insights - Detailed metrics to troubleshoot serverless applications
- CloudWatch Contributors Insights - Find “Top-N” Contributors through CloudWatch Logs
- CloudWatch Application Insights - Automatic dashboard to troubleshoot your application and related AWS services
EventBridge
- schedule - cron jobs (lambda)
- event pattern - event rules to react to a service doing something
- trigger lambda functions, send SQS/SNS messages
- event buses can be accessed by other AWS accounts using resource-based policies
- archive events (all/filter) sent to an event bus (indefinitely or set period)
- ability to replay archived events
- schema registry
- event-bridge can analyze the events in your bus and infer the schema
- allows you to generate code for your application
- can be versioned
- resource-based policy
- manage permissions for a specific event-bus
- example - allow/deny events from another AWS account or Region
- use case - aggregate all events from your AWS organization in a single AWS account or AWS Region
- security
- when a rule runs, it needs permissions on the target
- resource-based policy
- lambda
- SNS
- SQS
- CloudWatch Logs
- API Gateway
- IAM role
- Kinesis Stream
- System Manager Run Command
- ECS task
CloudTrail
- Provides governance, compliance and audit for your AWS Account
- CloudTrail is enabled by default!
- Get an history of events / API calls made within your AWS Account
- Console
- SDK
- CLI
- AWS Services (IAM Users/Roles)
- Can put logs from CloudTrail into CloudWatch Logs or S3
- A trail can be applied to All Regions (default) or a single Region
- If a resource is deleted in AWS, investigate CloudTrail first!
- events
- management events
- operations that are performed on resources in your AWS account
- examples
- configure security
- configure rules for routing
- setting up logging
- by default, trails are configured to log management events
- can separate read / write
- data events
- by default, not logged (high volume operations)
- Amazon s3 object-level activity (R/W)
- lambda execution activity (
Invoke
API)
- insights events - to detect unusual activity in your accounts
- examples
- inaccurate resource provisioning
- hitting service limits
- bursts of AWS IAM actions
- gaps in periodic maintenance activity
- analyze normal management events to create a baseline
- then, continuously analyzes write events to detect unusual patterns
- anomalies appear in the cloud-trail console
- event is sent to amazon s3
- an event-bridge event is generated (for automation needs)
- examples
- events retention
- events are stored for 90 days
- to keep even longer, to s3 and use Athena
- management events
- intercept API calls
AWS Config
- Helps with auditing and recording
compliance
of your AWS resources - Helps record configurations and changes over time
- Questions that can be solved by AWS Config
- Is there unrestricted SSH access to my security groups?
- Do my buckets have any public access?
- How has my ALB configuration changed over time?
- You can receive alerts (SNS notifications) for any changes
- AWS Config is a
per-region
service - Can be aggregated across regions and accounts
- Possibility of storing the configuration data into S3 (analyzed by Athena)
- config rules
- AWS managed config rules (over 75)
- custom config rules (must be defined in AWS lambda)
- AWS config rules doesn’t prevent actions from happening (no-deny)
- remediations
- automate remediation of non-compliant resources using SSM automation documents
- Use AWS-Managed Automation Documents or create custom Automation Documents
- You can set Remediation Retries if the resource is still non-compliant after auto-remediation
- notifications
- Use EventBridge to trigger notifications when AWS resources are non-compliant
- Ability to send configuration changes and compliance state notifications to SNS (all events – use SNS Filtering or filter at client-side)
Comparison
- CloudWatch
- Monitoring Incoming connections as metric
- Visualize error codes as % over time
- Make a dashboard to get an idea of your load balancer performance
- Config
- Track security group rules for the Load Balancer
- Track configuration changes for the Load Balancer
- Ensure an SSL certificate is always assigned to the Load Balancer (compliance)
- CloudTrail
- Track who made any changes to the Load Balancer with API calls
ELB
- CloudWatch
- Monitoring Incoming connections metric
- Visualize error codes as % over time
- Make a dashboard to get an idea of your load balancer performance
- Config
- Track security group rules for the Load Balancer
- Track configuration changes for the Load Balancer
- Ensure an SSL certificate is always assigned to the Load Balancer (compliance)
- CloudTrail
- Track who made any changes to the Load Balancer with API calls