- CloudWatch is for monitoring/performance.
- CloudTrail is for auditing API call stacks e.g. when/where/by whom.
- Can monitor Compute (EC2, ASG, ELB, Route53 health checks..), Storage & Content Delivery (EBS, Storage)...
- Metrics
- Provides metrics (e.g. CPU utilization, Network Utilization, Disk Reads/Writes, Status Check) for every services in AWS.
- Metrics belong to namespaces
- Dimension is an attribute of an metric (instance id, environment, etc..)
- Up to 10 dimensions per metric
- ❗ 14 days retention, Extended retention offering allows up to 15 months.
- Metrics have timestamps
- EC2 Detailed Monitoring
- 📝EC2 instance metrics have basic metrics for "every 5 minutes"
- With detailed monitoring (for a cost), you get data "every 1 minute"
- 💡 Use detailed monitoring if you want to more prompt scale your ASG
- AWS Free Tier allows up to 10 detailed monitoring metrics
- ❗ EC2 memory usage is by default not pushed
- 💡 Must be pushed from inside the instance as a custom metric with e.g. CloudWatch agent.
- 📝EC2 instance metrics have basic metrics for "every 5 minutes"
- Custom Metrics
- Possibility to define and send your own custom metrics to CloudWatch
- E.g. for RAM utilization, disk storage usage.
- Must be pushed from inside the instance as a custom metric by installing agent
- E.g. CloudWatch agent
- Ability to use dimensions (attributes) to segment metrics
- E.g.
Instance.id
,Environment.name
- E.g.
- Metric resolution
- Standard: 1 minute
- High resolution: up to 1 second (
StorageResolution
API parameter)- Higher cost
- API call
PutMetricData
- 💡 Use exponential back off in case of throttle errors
- Possibility to define and send your own custom metrics to CloudWatch
- ❗ CloudWatch itself does not have a native export feature that will send data periodically to S3.
- Dashboards
- Can create CloudWatch dashboard of metrics
- In console you can monitor & access dashboards
- Can be regional or global (e.g. include graphs from different regions)
- You can change the time zone & time range of the dashboards
- You can setup automatic refresh (10s, 1m, 2m, 5m, 15m)
- #
- 3 dashboards (up to 50 metrics) for free
- 3$ per dashboard per month afterwards
- Types are:
- Line: compare metrics over time
- Stacked area: Compare the total over time
- Number: instantly see the latest value for a metric
- Text: Free text with markdown formatting
- Query results: Explore results from Logs Insights
- Logs
- Applications can send logs to CloudWatch using the SDK
- CloudWatch Logs metric filters can evaluate CloudTrail logs for specific terms, phrases or values.
- I.e. values are not always from CloudWatch Metrics, but can be generated from Logs e.g. HTTP errors.
- CloudWatch can collect log from Elastic Beanstalk, ECS, AWS Lambda, VPC Flow Logs, API Gateway, CloudTrail, CloudWatch log agents, Route53 and more.
- CloudWatch Log Agents
- Install on EC2 machines
sudo yum install -y awslogs
- Ensure EC2 has IAM permissions to write to CloudWatch
- Configure
/etc/awslogs/awslogs.conf
for logs (errors etc.) - Configure
/etc/awslogs/awscli.conf
for region - Start service with
systemctl start awslogsd
- Install on EC2 machines
- CloudWatch Log Agents
- CloudWatch logs can go to:
- Batch exporter to S3 for archival
- Stream to ElasticSearch cluster for further analytics
- Stream to Lambda
- You need to store logs in 2 things:
- Log groups: arbitrary name, usually representing an application
- Log stream: instances within application / log files / containers
- Can define log expiration policies (never expire, 30 days, etc..)
- You pay for data retention in CloudWatch
- Using the AWS CLI we can tail CloudWatch logs
- To see e.g. how application is behaving in real time
- Security
- ❗ To send logs to CloudWatch, make sure IAM permissions are correct!
- Encryption of logs using KMS at the Group Level
- CloudWatch Logs can use filter expressions
- E.g. find a specific IP inside of a log
- Metric filters can be used to trigger alarms
- E.g. if specific IP appears you can trigger an alarm
- CloudWatch Logs Insights
- Log analytics service for CloudWatch
- Can be used to query logs and add queries into CloudWatch Dashboards
- Pay for the queries you run
- Alarms
- Alarms are used to trigger notifications for any metrics
- You can set up billing alarms to be triggered after the account charges goes over a certain threshold.
- Alarms invokes actions such as:
- EC2 Actions: e.g. restart EC2.
- SNS Notifications: email, SMS, etc.
- Auto Scaling: triggers Auto Scaling policies.
- Various options (sampling, %, max, min, etc...)
- Alarm states:
OK
,INSUFFICIENT_DATA
,ALARM
(being triggered) - Period:
- Length of time in seconds to evaluate the metric
- 📝 High resolution custom metrics
- Decreases as metrics age: 1 sec (for 3 hours), then 1 minute (for 15 days), 5 minute (for 63 days), 1 hour for 15 months.
- E.g.
NetworkOut < 2.000.000
for 1 data points (EC2) within 5 minutes
- Data points
- Represents the values of that variable over time
- E.g. if period is 5 minutes and data points is three then the alarm will trigger after 15 minutes of being condition met
- Events
- Event Rule
- Types
- Schedule: Notifications that'll be triggered on demand
- Event Pattern: React to service doing something e.g. CodePipeline state changes.
- Targets: e.g. lambda function, EC2
StopInstances
API call, SNS, SQS, ECS Task, Event bus in another AWS account...
- Types
- Triggers to Lambda functions, SQS/SNS/Kinesis Messages
- CloudWatch Event creates a small JSON document to give information about the change
- Event Rule
- Tracks API events allowing you to see who accessed what resources and when.
- CloudTrail reports on who made the change, when, and from which location.
- Per AWS account & per region
- 💡 Should be enabled in all regions with a cloud formation stack.
- All accounts / regions can log into same S3 bucket in an account / region.
- In a region when you apply the trail to all regions, CloudTrail creates a new trail in all other regions.
- Enabled by default
- Default metrics are from hypervisor (e.g. CPU, connections)
- Many services has deeper "Advanced monitoring" for inside hypervisor metrics such as connected users, CPU usage per thread / application.
- Encryption
- A single KMS key can be used to encrypt log files for trails applied to all regions.
- CloudTrail log files are by default encrypted using S3 Server Side Encryption (SSE)
- You can also enable encryption SSE KMS for additional security
- Get an history of events / API calls made within your AWS Account by Console, SDK, CLI, AWS Services
- Can push to S3 (encrypted by default), CloudWatch Logs and SNS.
- Has 90 days of retention
- 📝 If a resource is deleted in AWS, look into CloudTrail first!
- Asses, audit, and evaluate the configurations of your AWS resources.
- Resources include RDS, subnets, DB snapshots, security groups, and event subscriptions.
- Reports on what has changed
- You can e.g. look back and see what instances were in default VPC last week.
- Per AWS account & per region
- 💡 Should be enabled in all regions with a cloud formation stack.
- Data aggregation in AWS Config allows you to aggregate AWS Config data from multiple accounts and regions into a single account.
- Tracks resource state.
- AWS Config is around compliance, Trusted Advisor is more around recommendations but they check same things for security.
- Integrates with SNS to receive notifications.