Why CloudWatch is So Helpful: Monitoring and Logging Made Easy
Amazon CloudWatch is AWS's monitoring and observability service. It collects metrics, logs, and events from your AWS resources and applications, giving you visibility into what's happening in your infrastructure.
What is CloudWatch?
CloudWatch is a comprehensive monitoring solution that provides:
- Metrics: Numerical data points (CPU usage, request count, errors)
- Logs: Application and system logs from your services
- Alarms: Automated notifications when thresholds are exceeded
- Dashboards: Visual representations of your metrics
- Insights: Analytics and queries for log data
Think of CloudWatch as your application's health monitor and log aggregator.
Why CloudWatch is Essential
1. Visibility Without Effort
CloudWatch automatically collects metrics from AWS services:
- EC2 instances (CPU, memory, network)
- RDS databases (connections, storage)
- Lambda functions (invocations, errors, duration)
- Application Load Balancers (request count, response time)
- And many more...
You don't need to install anything—it's already collecting data!
2. Centralized Logging
Instead of SSHing into servers to check logs, CloudWatch Logs aggregates logs from:
- EC2 instances
- Lambda functions
- Container services (ECS, EKS)
- Your applications
All logs in one place, searchable and filterable.
3. Proactive Problem Detection
Set up alarms to notify you before problems become critical:
- CPU usage above 80%
- Error rate increasing
- Disk space running low
- API response time too high
Get alerts via email, SMS, or SNS topics.
4. Historical Data and Trends
CloudWatch stores metrics for up to 15 months, allowing you to:
- Identify patterns and trends
- Plan capacity based on historical data
- Debug issues by comparing current vs. past behavior
5. Cost Monitoring
Track AWS costs and usage:
- See spending trends
- Identify expensive resources
- Set budget alarms
CloudWatch Core Concepts
Metrics
Metrics are time-ordered data points:
- Namespace: Container for metrics (e.g., "AWS/EC2")
- Metric Name: Name of the metric (e.g., "CPUUtilization")
- Dimensions: Name-value pairs that identify unique metric streams
- Timestamp: When the data point was collected
- Value: The actual measurement
Example: AWS/EC2 CPUUtilization for instance i-1234567890abcdef0
Log Groups and Log Streams
- Log Group: Container for log streams (e.g., "/aws/ec2/myapp")
- Log Stream: Sequence of log events from a single source (e.g., specific EC2 instance)
Alarms
Alarms monitor metrics and trigger actions:
- Threshold: Value that triggers the alarm
- Period: Evaluation period (e.g., 5 minutes)
- Actions: What to do when alarm state changes (SNS, Auto Scaling, etc.)
Dashboards
Dashboards are collections of widgets showing metrics:
- Line graphs
- Number widgets
- Text widgets
- Custom widgets
CloudWatch Metrics in Action
Automatic EC2 Metrics
Every EC2 instance automatically sends these metrics:
- CPUUtilization: Percentage of CPU used
- NetworkIn/NetworkOut: Bytes transferred
- DiskReadOps/DiskWriteOps: Disk I/O operations
- StatusCheckFailed: Health check failures
View metrics:
- Go to CloudWatch Console
- Click "Metrics" → "All metrics"
- Select "EC2" → "Per-Instance Metrics"
- Select your instance and metric
Custom Metrics
You can send custom metrics from your application:
Using AWS SDK (Java):
import software.amazon.awssdk.services.cloudwatch.CloudWatchClient;
import software.amazon.awssdk.services.cloudwatch.model.*;
CloudWatchClient cloudWatch = CloudWatchClient.builder()
.region(Region.US_EAST_1)
.build();
MetricDatum metricDatum = MetricDatum.builder()
.metricName("ActiveUsers")
.value(150.0)
.timestamp(Instant.now())
.unit(StandardUnit.COUNT)
.build();
PutMetricDataRequest request = PutMetricDataRequest.builder()
.namespace("MyApplication")
.metricData(metricDatum)
.build();
cloudWatch.putMetricData(request);
CloudWatch Logs
Sending Logs from Spring Boot
Add CloudWatch Logs dependency:
<dependency>
<groupId>ca.pjer</groupId>
<artifactId>logback-awslogs-appender</artifactId>
<version>1.6.0</version>
</dependency>
Configure logback-spring.xml:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<appender name="CLOUDWATCH" class="ca.pjer.logback.AwsLogsAppender">
<logGroupName>my-spring-boot-app</logGroupName>
<logStreamName>application-${HOSTNAME}</logStreamName>
<region>us-east-1</region>
<maxBatchLogEvents>50</maxBatchLogEvents>
<maxFlushTimeMillis>30000</maxFlushTimeMillis>
<layout>
<pattern>%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36} - %msg%n</pattern>
</layout>
</appender>
<root level="INFO">
<appender-ref ref="CLOUDWATCH" />
<appender-ref ref="CONSOLE" />
</root>
</configuration>
Viewing Logs
- Go to CloudWatch Console
- Click "Logs" → "Log groups"
- Select your log group
- Click on a log stream
- View and search log events
Search logs:
- Filter by time range
- Search by text (e.g., "ERROR", "Exception")
- Use filter patterns:
[timestamp, level, message]
Creating CloudWatch Alarms
Example: High CPU Usage Alarm
Scenario: Alert when EC2 instance CPU exceeds 80%
Steps:
- Go to CloudWatch Console
- Click "Alarms" → "All alarms"
- Click "Create alarm"
- Click "Select metric"
- Choose "EC2" → "Per-Instance Metrics"
- Select "CPUUtilization" metric
- Select your instance
- Click "Select metric"
Configure alarm:
- Metric: CPUUtilization
- Statistic: Average
- Period: 5 minutes
- Threshold type: Static
- Threshold: Greater than 80
- Datapoints to alarm: 2 out of 2
Configure actions:
- Notification: Create SNS topic or select existing
- Email: Enter your email address
- Alarm state trigger: In alarm
- Name alarm: "High-CPU-Alarm"
- Click "Create alarm"
Result: You'll receive an email when CPU exceeds 80% for 10+ minutes.
Example: Error Rate Alarm
Monitor application errors:
Using CloudWatch Metric Math:
m1 = Sum of HTTP 5xx errors (per 5 minutes)
m2 = Total requests (per 5 minutes)
(m1 / m2) * 100 > 5
Create alarm when error rate exceeds 5%.
CloudWatch Dashboards
Create a Dashboard
- Go to CloudWatch Console
- Click "Dashboards" → "All dashboards"
- Click "Create dashboard"
- Name it: "My Application Dashboard"
- Click "Create dashboard"
Add Widgets
Example: EC2 CPU Widget
- Click "Add widget"
- Select "Line" graph
- Select "EC2" → "Per-Instance Metrics" → "CPUUtilization"
- Select your instance(s)
- Configure:
- Period: 5 minutes
- Statistic: Average
- Click "Create widget"
Example: Application Error Count
- Add "Number" widget
- Select custom metric: "MyApplication/Errors"
- Statistic: Sum
- Period: 1 hour
- Create widget
Example: Log Insights Query
- Add "Logs table" widget
- Select log group
- Enter query:
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 20
Dashboard Best Practices
- Group related metrics: Keep EC2, database, and application metrics together
- Use appropriate time ranges: 1 hour, 6 hours, 24 hours, 1 week
- Set meaningful titles: Make widgets self-explanatory
- Refresh automatically: Set auto-refresh for real-time monitoring
CloudWatch Logs Insights
Log Insights lets you query and analyze log data using a SQL-like syntax.
Basic Queries
Get recent log entries:
fields @timestamp, @message
| sort @timestamp desc
| limit 100
Filter errors:
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
Count errors by level:
fields @message
| parse @message "[*] *" as level, message
| stats count() by level
Find slow API requests:
fields @timestamp, @message
| parse @message "GET * took *ms" as endpoint, duration
| filter duration > 1000
| sort duration desc
Advanced Queries
Error rate over time:
fields @timestamp, @message
| filter @message like /ERROR/
| stats count() as errorCount by bin(5m)
Top endpoints by request count:
fields @message
| parse @message "GET * " as endpoint
| stats count() as requests by endpoint
| sort requests desc
| limit 10
Monitoring Spring Boot Applications
Application Metrics
Send custom metrics from Spring Boot:
import io.micrometer.cloudwatch2.CloudWatchConfig;
import io.micrometer.cloudwatch2.CloudWatchMeterRegistry;
import io.micrometer.core.instrument.MeterRegistry;
@Configuration
public class CloudWatchMetricsConfig {
@Bean
public CloudWatchMeterRegistry cloudWatchMeterRegistry() {
CloudWatchConfig config = new CloudWatchConfig() {
@Override
public String get(String key) {
return null;
}
@Override
public String namespace() {
return "MySpringBootApp";
}
};
return new CloudWatchMeterRegistry(
config,
Clock.SYSTEM,
CloudWatchAsyncClient.create()
);
}
}
Track custom metrics:
@Service
public class OrderService {
private final Counter orderCounter;
private final Timer orderProcessingTime;
public OrderService(MeterRegistry meterRegistry) {
this.orderCounter = Counter.builder("orders.created")
.description("Total orders created")
.register(meterRegistry);
this.orderProcessingTime = Timer.builder("orders.processing.time")
.description("Order processing time")
.register(meterRegistry);
}
public void createOrder(Order order) {
Timer.Sample sample = Timer.start();
try {
// Process order
orderCounter.increment();
} finally {
sample.stop(orderProcessingTime);
}
}
}
Health Checks
Expose Spring Boot Actuator metrics:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
management.endpoints.web.exposure.include=health,metrics,prometheus
management.metrics.export.cloudwatch.namespace=MySpringBootApp
Cost Considerations
CloudWatch pricing:
- Custom metrics: $0.30 per metric per month (first 10,000 free)
- API requests: $0.01 per 1,000 requests (first 1 million free)
- Log ingestion: $0.50 per GB (first 5 GB free)
- Log storage: $0.03 per GB per month
- Dashboards: $3 per dashboard per month (first 3 free)
- Alarms: $0.10 per alarm per month (first 10 free)
Cost Optimization Tips:
- Use metric math to combine metrics instead of creating multiple custom metrics
- Set log retention periods (logs older than retention are deleted)
- Use sampling for high-volume logs
- Archive old logs to S3 (cheaper storage)
- Limit dashboard widgets (each widget costs per API call)
Common Use Cases
Use Case 1: Monitor Application Performance
Metrics to track:
- Request count
- Response time (p50, p95, p99)
- Error rate
- Active users
Set alarms for:
- Response time > 1 second
- Error rate > 1%
- Request count drops significantly
Use Case 2: Capacity Planning
Track:
- CPU utilization trends
- Memory usage over time
- Request volume patterns
Use data to:
- Plan instance sizing
- Schedule scaling events
- Predict future capacity needs
Use Case 3: Troubleshooting
When an issue occurs:
- Check alarms for any triggered alerts
- View recent logs in Log Insights
- Compare current metrics to historical data
- Query logs for specific error patterns
- Trace request flow through logs
Use Case 4: Compliance and Auditing
Track:
- All API calls (via CloudTrail integration)
- Access patterns
- Error events
- Security-related events
Generate reports:
- Error summaries
- Access logs
- Performance reports
Best Practices
1. Set Up Alarms Early
Don't wait until production. Set up basic alarms during development.
2. Use Meaningful Names
Name metrics, logs, and alarms descriptively:
- ✅ Good:
api-response-time-p95 - ❌ Bad:
metric1
3. Monitor What Matters
Focus on business-critical metrics:
- User-facing errors
- Performance bottlenecks
- Cost drivers
- Security events
4. Set Appropriate Thresholds
Alarms should alert on real problems, not noise:
- Too sensitive: Alert on every spike
- Too loose: Alert only on critical failures
- Right: Alert on sustained issues
5. Review and Refine
Regularly review:
- Alarm effectiveness (false positives/negatives)
- Unused metrics and logs
- Dashboard relevance
- Cost optimization opportunities
6. Use Log Retention
Set retention policies:
- Development: 7 days
- Staging: 30 days
- Production: 90 days or longer (based on compliance needs)
7. Centralize Logs
Send all application logs to CloudWatch:
- EC2 application logs
- Container logs (ECS/EKS)
- Lambda function logs
- API Gateway logs
CloudWatch vs. Alternatives
CloudWatch Advantages
- Native AWS Integration: Works seamlessly with AWS services
- No Infrastructure: Fully managed, no servers to run
- Comprehensive: Metrics, logs, alarms, dashboards in one place
- Cost-Effective: Generous free tier
When to Consider Alternatives
- Third-party tools: If you need advanced analytics (Datadog, New Relic)
- Open source: If you want more control (Prometheus + Grafana)
- Multi-cloud: If running across AWS, Azure, GCP
For AWS-native applications, CloudWatch is usually the best choice.
Getting Started Checklist
- [ ] Enable CloudWatch for your EC2 instances
- [ ] Set up basic alarms (CPU, memory, errors)
- [ ] Configure application logging to CloudWatch
- [ ] Create a dashboard with key metrics
- [ ] Set up SNS topic for alarm notifications
- [ ] Review CloudWatch pricing and optimize
- [ ] Set log retention policies
- [ ] Document your monitoring strategy
CloudWatch is your window into your AWS infrastructure and applications. Start with basic metrics and alarms, then expand as you need deeper insights. The visibility it provides is invaluable for maintaining reliable, performant applications.
Next Steps:
- Set up CloudWatch for your EC2 instances
- Configure application logging
- Create your first dashboard
- Set up critical alarms
- Explore Log Insights for advanced log analysis
