To streamline network operations and accelerate incident response, we have developed a validated Ansible content collection that automates routine health-checks on network devices. This collection provides modular roles designed to gather critical health information across multiple network platforms (Cisco IOS-XR, IOS-XE, NX-OS, and Arista EOS).
Monitors CPU utilization across devices, ensuring resources are optimized and identifying potential performance bottlenecks proactively.
- name: Monitor CPU utilization
ansible.builtin.include_role:
name: network.healthchecks.cpu
vars:
ansible_network_os: cisco.ios.ios
cpu_threshold: 80
ignore_errors: false
register: cpu_result
- name: Display CPU health check results
ansible.builtin.debug:
var: cpu_result.health_checks
{
"health_checks": {
"cpu_utilization": {
"check_status": "successful",
"current_utilization": 45,
"threshold": 80
},
"cpu_status_summary": {
"five_minute": 45,
"five_seconds": 40,
"one_minute": 42
},
"status": "successful"
}
}
Checks memory usage to detect and resolve potential memory leaks or over-utilization before they impact network stability.
- name: Monitor memory utilization
ansible.builtin.include_role:
name: network.healthchecks.memory
vars:
ansible_network_os: cisco.ios.ios
memory_threshold: 80
min_free_memory: 100
min_buffers: 50
min_cache: 50
ignore_errors: false
register: memory_result
- name: Display memory health check results
ansible.builtin.debug:
var: memory_result.health_checks
{
"health_checks": {
"memory_utilization": {
"check_status": "successful",
"current_utilization": 45,
"threshold": 80
},
"memory_free": {
"check_status": "successful",
"current_free": 150,
"min_free": 100
},
"memory_buffers": {
"check_status": "successful",
"current_buffers": 75,
"min_buffers": 50
},
"memory_cache": {
"check_status": "successful",
"current_cache": 60,
"min_cache": 50
},
"memory_status_summary": {
"total_mb": 1000,
"used_mb": 450,
"free_mb": 550,
"buffers_mb": 75,
"cache_mb": 60
},
"status": "successful"
}
}
Monitors system uptime to identify unexpected reboots and ensure reliable network availability.
- name: Monitor system uptime
ansible.builtin.include_role:
name: network.healthchecks.uptime
vars:
ansible_network_os: cisco.ios.ios
uptime_threshold_minutes: 1440 # 24 hours
ignore_errors: false
register: uptime_result
- name: Display uptime health check results
ansible.builtin.debug:
var: uptime_result.health_checks
{
"health_checks": {
"uptime": {
"check_status": "successful",
"current_uptime": 5760,
"min_uptime": 1440
},
"uptime_status_summary": {
"weeks": 3,
"days": 3,
"hours": 22,
"minutes": 36
},
"status": "successful"
}
}
Provides comprehensive monitoring of power supplies, temperature sensors, and fan statuses, crucial for avoiding hardware failures.
- name: Monitor environment status
ansible.builtin.include_role:
name: network.healthchecks.environment
vars:
ansible_network_os: cisco.ios.ios
temperature_threshold: 40
ignore_errors: false
register: environment_result
- name: Display environment health check results
ansible.builtin.debug:
var: environment_result.health_checks
{
"health_checks": {
"temperature": {
"check_status": "successful",
"current_temperature": 35,
"threshold": 40
},
"power_supply": {
"check_status": "successful",
"status": "ok"
},
"fan_status": {
"check_status": "successful",
"status": "ok"
},
"status": "successful"
}
}
Verifies filesystem health, ensuring devices have adequate storage and identifying storage-related anomalies.
- name: Monitor filesystem status
ansible.builtin.include_role:
name: network.healthchecks.filesystem
vars:
ansible_network_os: cisco.ios.ios
filesystem_threshold: 80
ignore_errors: false
register: filesystem_result
- name: Display filesystem health check results
ansible.builtin.debug:
var: filesystem_result.health_checks
{
"health_checks": {
"filesystem_utilization": {
"check_status": "successful",
"current_utilization": 45,
"threshold": 80
},
"filesystem_status_summary": {
"total_mb": 1000,
"used_mb": 450,
"free_mb": 550
},
"status": "successful"
}
}
- Regular Health Checks: Schedule automated health checks at regular intervals (e.g., daily or weekly)
- Threshold Configuration: Set appropriate thresholds based on your network's requirements
- Error Handling: Use
ignore_errors
judiciously to prevent playbook failures - Result Processing: Implement result processing to trigger alerts or notifications
- Documentation: Maintain documentation of health check configurations and thresholds
- name: Network Health Check Playbook
hosts: network_devices
gather_facts: false
tasks:
- name: Run CPU health check
ansible.builtin.include_role:
name: network.healthchecks.cpu
vars:
cpu_threshold: 80
register: cpu_result
- name: Run memory health check
ansible.builtin.include_role:
name: network.healthchecks.memory
vars:
memory_threshold: 80
min_free_memory: 100
register: memory_result
- name: Run uptime health check
ansible.builtin.include_role:
name: network.healthchecks.uptime
vars:
uptime_threshold_minutes: 1440
register: uptime_result
- name: Process health check results
ansible.builtin.debug:
msg: |
CPU Status: {{ cpu_result.health_checks.status }}
Memory Status: {{ memory_result.health_checks.status }}
Uptime Status: {{ uptime_result.health_checks.status }}
We encourage the networking field team to:
- Review these roles and provide feedback
- Test the roles in your environment
- Share use cases and requirements
- Report any issues or suggest improvements
For support and feedback, please:
- Open issues in the GitHub repository
- Contact the Ansible Network Content Team
- Join the Ansible Network Community