Skip to content

Instantly share code, notes, and snippets.

@s4kh
Last active October 2, 2025 13:12
Show Gist options
  • Select an option

  • Save s4kh/cea63404f2bc36ad08dcaba019a01b7a to your computer and use it in GitHub Desktop.

Select an option

Save s4kh/cea63404f2bc36ad08dcaba019a01b7a to your computer and use it in GitHub Desktop.

AWS EC2 High CPU Utilization Runbook

Alert: EC2 Instance High CPU Usage (>80%)

Prerequisites

  • AWS CLI configured or AWS Console access
  • SSH key for the EC2 instance
  • Instance ID from the alert

Step 1: Initial Assessment

1.1 Check CloudWatch Metrics

# Get instance CPU metrics for last hour
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-XXXXXXXXX \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Average,Maximum

1.2 Get Instance Details

# Get instance details and IP
aws ec2 describe-instances --instance-ids i-XXXXXXXXX \
  --query 'Reservations[0].Instances[0].[PublicIpAddress,PrivateIpAddress,InstanceType,State.Name]'

Step 2: Connect to Instance

# SSH into the instance
ssh -i /path/to/key.pem ec2-user@<PUBLIC_IP>
# Or for Ubuntu AMIs:
ssh -i /path/to/key.pem ubuntu@<PUBLIC_IP>

Step 3: Identify High CPU Processes

3.1 Real-time CPU Usage

# Show top processes by CPU
top -b -n 1 | head -20

# Alternative with better formatting
htop  # If available

# Show CPU usage per core
mpstat -P ALL 1 5

3.2 Process Investigation

# List all processes sorted by CPU usage
ps aux --sort=-%cpu | head -10

# Get detailed info about specific process
ps -p <PID> -o pid,ppid,user,%cpu,%mem,vsz,rss,tty,stat,start,time,command

# Show process tree
pstree -p <PID>

# Check how long process has been running
ps -p <PID> -o etime

3.3 Historical Process Data

# Check system load averages
uptime

# Review system activity (if sar is installed)
sar -u 1 10  # CPU usage every 1 second for 10 times

# Check dmesg for system issues
dmesg -T | tail -50

3.4 Check Recent Deployments

# Check if recent deployment triggered the CPU spike
# Review deployment timestamps (adjust paths based on your setup)
ls -ltr /opt/app/ | tail -5
ls -ltr /var/www/ | tail -5
stat /usr/local/bin/<app-binary>

# Check systemd service restart times
systemctl status <service-name> | grep -i "active since"
journalctl -u <service-name> --since "1 hour ago" | grep -i "started\|stopped"

# Docker deployments
docker ps --format "table {{.Names}}\t{{.CreatedAt}}\t{{.Status}}"
docker logs <container-name> --since 1h | head -20

# Check CI/CD deployment logs
tail -100 /var/log/deploy.log  # If using custom deployment logging
grep -i deploy /var/log/syslog | tail -20

# AWS CodeDeploy logs (if using)
tail -f /opt/codedeploy-agent/deployment-root/deployment-logs/codedeploy-agent-deployments.log

# Compare current version with previous
cat /opt/app/version.txt  # Or wherever version is stored
git log --oneline -10  # If code is in git repo on server

# Check if configuration files were recently modified
find /etc/<app-config-dir> -type f -mmin -60  # Modified in last 60 minutes

Through AWS Console

  1. Go to Systems Manager → Run Command.
  2. Click "Run command".
  3. In the "Command document" field, start typing the name and select CSP-Custom-Stop-CPU-Load-Script
  4. Under "Target selection", select "Choose instances manually" and select your EC2 instance.
  5. Click "Run".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment