Skip to content

Instantly share code, notes, and snippets.

@glorat
Last active March 29, 2025 14:23
Show Gist options
  • Save glorat/79a1371630bf88d924f03c0c0781cc7a to your computer and use it in GitHub Desktop.
Save glorat/79a1371630bf88d924f03c0c0781cc7a to your computer and use it in GitHub Desktop.
GitHub Actions Runner Idle Shutdown

GitHub Actions Runner Idle Shutdown

This gist contains a solution to automatically shut down an Azure VM running a GitHub Actions self‑hosted runner when it has been idle for a specified period. It does so by monitoring the last modified time of the runner’s Worker logs in the _diag folder. If no new Worker logs have been created for a set threshold (for example, 30 minutes), the script triggers a shutdown.

Files

  • check_idle_worker.sh
    A Bash script that:

    • Scans the actions-runner/_diag directory for files whose names start with Worker_.
    • Calculates how long it has been since the most recent update.
    • Outputs the idle time.
    • If run with a threshold argument (in minutes) and the idle time exceeds that threshold, it initiates a shutdown.
  • idle-check.service
    A systemd service unit file that runs check_idle_worker.sh as a background service. It runs as the user azureuser and automatically restarts if it fails.

Setup Instructions

  1. Copy the Files to Your VM

    • Create a directory for your custom scripts (if it doesn’t exist):

      mkdir -p /home/azureuser/scripts
    • Copy the content of check_idle_worker.sh (provided in this gist) into /home/azureuser/scripts/check_idle_worker.sh.

    • Place the idle-check.service file (provided in this gist) in /etc/systemd/system/idle-check.service.

  2. Make the Script Executable

    chmod +x /home/azureuser/scripts/check_idle_worker.sh
    
  3. Review & Edit the Script By default, the script looks for logs in the directory /home/azureuser/actions-runner/_diag. Update the DIAG_DIR variable in check_idle_worker.sh if your runner is installed elsewhere.

  4. Configure the Systemd Service The idle-check.service file is set to run the script with a 30‑minute threshold. If you wish to change this threshold, adjust the argument passed in the ExecStart line.

  5. Reload, Enable, and Start the Service Run the following commands:

sudo systemctl daemon-reload
sudo systemctl enable idle-check.service
sudo systemctl start idle-check.service
  1. Verify the Service Check the status with:

sudo systemctl status idle-check.service

And view live logs:

journalctl -u idle-check.service -f

How It Works • The script uses the find command to determine the most recent modification time among files starting with Worker_ in the _diag folder. • It calculates how many minutes have elapsed since that last modification. • If the idle time meets or exceeds the provided threshold (30 minutes in the example), the script triggers a shutdown. • The systemd service ensures the script runs continuously, so the VM will shut itself down when idle.

This setup helps minimize costs by ensuring that your VM automatically deallocates when no work is happening, while your CI process can always start the VM on demand.

Feel free to modify the files as needed for your environment.

#!/bin/bash
# check_idle_worker.sh
#
# This script continuously calculates the idle time based on the most recent modification
# of files in the actions-runner/_diag directory that begin with "Worker_".
# It outputs the idle time and, if a threshold (in minutes) is provided as a command-line argument,
# it will deallocate the VM if the idle time exceeds that threshold.
#
# A grace period of 10 minutes after boot is used to prevent premature deallocation.
#
# Usage:
# ./check_idle_worker.sh [THRESHOLD_MINUTES] [--test]
#
# Example:
# ./check_idle_worker.sh 30
# will deallocate the VM if idle for 30 minutes (after the 10-minute grace period).
#
# To run in test mode (i.e. only print status and show what would be done without actually deallocating):
# ./check_idle_worker.sh 30 --test
#####################
# Configuration
#####################
# Path to the _diag folder (adjust if necessary)
DIAG_DIR="/home/azureuser/actions-runner/_diag"
# Idle threshold in minutes; if set to 0, no deallocation will occur.
THRESHOLD_MINUTES=${1:-0}
# Grace period after boot (in minutes) during which idle checks are skipped.
GRACE_PERIOD_MINUTES=10
# Path to the service principal credentials file (required)
CREDENTIALS_FILE="/home/azureuser/azure-creds.json"
#####################
# Pre-flight Checks
#####################
# Ensure the credentials file exists.
if [ ! -f "$CREDENTIALS_FILE" ]; then
echo "Error: Credentials file $CREDENTIALS_FILE does not exist. Aborting."
exit 1
fi
# Verify that the _diag directory exists.
if [ ! -d "$DIAG_DIR" ]; then
echo "Error: Directory $DIAG_DIR does not exist."
exit 1
fi
# Retrieve VM metadata to dynamically set RESOURCE_GROUP and VM_NAME.
metadata=$(curl -s -H "Metadata:true" "http://169.254.169.254/metadata/instance?api-version=2021-02-01")
if [ -z "$metadata" ]; then
echo "Error: Unable to retrieve instance metadata."
exit 1
fi
RESOURCE_GROUP=$(echo "$metadata" | jq -r '.compute.resourceGroupName')
VM_NAME=$(echo "$metadata" | jq -r '.compute.name')
echo "Detected Resource Group: $RESOURCE_GROUP"
echo "Detected VM Name: $VM_NAME"
# Pre-flight Azure login.
echo "Performing Azure login using service principal credentials from $CREDENTIALS_FILE..."
if ! command -v jq &>/dev/null; then
echo "Error: 'jq' is required but not installed. Please install it (e.g., sudo apt-get install jq)."
exit 1
fi
az login --service-principal \
--username "$(jq -r '.clientId' "$CREDENTIALS_FILE")" \
--password "$(jq -r '.clientSecret' "$CREDENTIALS_FILE")" \
--tenant "$(jq -r '.tenantId' "$CREDENTIALS_FILE")"
if [ $? -ne 0 ]; then
echo "Azure login failed using service principal credentials. Aborting."
exit 1
fi
echo "Azure login successful."
#####################
# Optional Test Mode Flag
#####################
TEST_MODE=0
if [[ "$2" == "--test" ]]; then
TEST_MODE=1
echo "Test mode enabled. No deallocation will occur, but logs will indicate what would happen."
fi
#####################
# Functions
#####################
# Get uptime in minutes from /proc/uptime.
get_uptime_minutes() {
uptime_seconds=$(cut -d' ' -f1 /proc/uptime)
echo "scale=2; $uptime_seconds / 60" | bc
}
# Get the most recent modification time (in seconds since epoch) from Worker log files.
get_recent_mod_time() {
find "$DIAG_DIR" -type f -name "Worker_*.log" -printf '%T@\n' 2>/dev/null | sort -n | tail -1
}
#####################
# Main Loop
#####################
while true; do
current_time=$(date +%s)
uptime_minutes=$(get_uptime_minutes)
echo "[$(date)] System uptime: ${uptime_minutes} minutes"
# Skip idle check if within the grace period.
if (( $(echo "$uptime_minutes < $GRACE_PERIOD_MINUTES" | bc -l) )); then
echo "Within grace period (${GRACE_PERIOD_MINUTES} minutes). Skipping idle check."
sleep 60
continue
fi
# Get the most recent modification time among Worker logs.
recent_mod_time=$(get_recent_mod_time)
if [ -z "$recent_mod_time" ]; then
echo "No Worker log files found in $DIAG_DIR. Unable to determine idle time."
sleep 60
continue
fi
# Convert recent_mod_time to an integer (seconds)
recent_mod_time=${recent_mod_time%.*}
idle_seconds=$(( current_time - recent_mod_time ))
idle_minutes=$(echo "scale=2; $idle_seconds / 60" | bc)
echo "[$(date)] Idle time based on Worker logs: ${idle_minutes} minutes"
# If a threshold is provided, check if deallocation is required.
if [ "$THRESHOLD_MINUTES" -gt 0 ]; then
idle_minutes_int=$(printf "%.0f" "$idle_minutes")
if [ "$idle_minutes_int" -ge "$THRESHOLD_MINUTES" ]; then
echo "Idle time ($idle_minutes_int minutes) exceeds threshold ($THRESHOLD_MINUTES minutes)."
if [ $TEST_MODE -eq 1 ]; then
echo "Test mode: would deallocate VM $VM_NAME in resource group $RESOURCE_GROUP now."
else
echo "Initiating deallocation..."
echo "Deallocating VM $VM_NAME in resource group $RESOURCE_GROUP..."
az vm deallocate --resource-group "$RESOURCE_GROUP" --name "$VM_NAME"
exit 0
fi
else
echo "Idle time ($idle_minutes_int minutes) is below the threshold. No deallocation."
fi
fi
sleep 60
done
[Unit]
Description=Idle Timeout Checker for GitHub Actions Runner
After=network.target
[Service]
Type=simple
# Adjust the path and threshold (30 minutes in this example) as needed.
ExecStart=/home/azureuser/scripts/check_idle_worker.sh 30
Restart=always
User=azureuser
Environment=PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
[Install]
WantedBy=multi-user.target
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment