Network Health Check Validated Content - Automation Overview

Introduction

To streamline network operations and accelerate incident response, we have developed a validated Ansible content collection that automates routine health-checks on network devices. This collection provides modular roles designed to gather critical health information across multiple network platforms (Cisco IOS-XR, IOS-XE, NX-OS, and Arista EOS).

Validated Collection Roles Overview

CPU Health Check

Monitors CPU utilization across devices, ensuring resources are optimized and identifying potential performance bottlenecks proactively.

Example Playbook

- name: Monitor CPU utilization
  ansible.builtin.include_role:
    name: network.healthchecks.cpu
  vars:
    ansible_network_os: cisco.ios.ios
    cpu_threshold: 80
    ignore_errors: false
  register: cpu_result

- name: Display CPU health check results
  ansible.builtin.debug:
    var: cpu_result.health_checks

Health Check Output

{
    "health_checks": {
        "cpu_utilization": {
            "check_status": "successful",
            "current_utilization": 45,
            "threshold": 80
        },
        "cpu_status_summary": {
            "five_minute": 45,
            "five_seconds": 40,
            "one_minute": 42
        },
        "status": "successful"
    }
}

Memory Health Check

Checks memory usage to detect and resolve potential memory leaks or over-utilization before they impact network stability.

Example Playbook

- name: Monitor memory utilization
  ansible.builtin.include_role:
    name: network.healthchecks.memory
  vars:
    ansible_network_os: cisco.ios.ios
    memory_threshold: 80
    min_free_memory: 100
    min_buffers: 50
    min_cache: 50
    ignore_errors: false
  register: memory_result

- name: Display memory health check results
  ansible.builtin.debug:
    var: memory_result.health_checks

Health Check Output

{
    "health_checks": {
        "memory_utilization": {
            "check_status": "successful",
            "current_utilization": 45,
            "threshold": 80
        },
        "memory_free": {
            "check_status": "successful",
            "current_free": 150,
            "min_free": 100
        },
        "memory_buffers": {
            "check_status": "successful",
            "current_buffers": 75,
            "min_buffers": 50
        },
        "memory_cache": {
            "check_status": "successful",
            "current_cache": 60,
            "min_cache": 50
        },
        "memory_status_summary": {
            "total_mb": 1000,
            "used_mb": 450,
            "free_mb": 550,
            "buffers_mb": 75,
            "cache_mb": 60
        },
        "status": "successful"
    }
}

Uptime Health Check

Monitors system uptime to identify unexpected reboots and ensure reliable network availability.

Example Playbook

- name: Monitor system uptime
  ansible.builtin.include_role:
    name: network.healthchecks.uptime
  vars:
    ansible_network_os: cisco.ios.ios
    uptime_threshold_minutes: 1440  # 24 hours
    ignore_errors: false
  register: uptime_result

- name: Display uptime health check results
  ansible.builtin.debug:
    var: uptime_result.health_checks

Health Check Output

{
    "health_checks": {
        "uptime": {
            "check_status": "successful",
            "current_uptime": 5760,
            "min_uptime": 1440
        },
        "uptime_status_summary": {
            "weeks": 3,
            "days": 3,
            "hours": 22,
            "minutes": 36
        },
        "status": "successful"
    }
}

Environment Health Check

Provides comprehensive monitoring of power supplies, temperature sensors, and fan statuses, crucial for avoiding hardware failures.

Example Playbook

- name: Monitor environment status
  ansible.builtin.include_role:
    name: network.healthchecks.environment
  vars:
    ansible_network_os: cisco.ios.ios
    temperature_threshold: 40
    ignore_errors: false
  register: environment_result

- name: Display environment health check results
  ansible.builtin.debug:
    var: environment_result.health_checks

Health Check Output

{
    "health_checks": {
        "temperature": {
            "check_status": "successful",
            "current_temperature": 35,
            "threshold": 40
        },
        "power_supply": {
            "check_status": "successful",
            "status": "ok"
        },
        "fan_status": {
            "check_status": "successful",
            "status": "ok"
        },
        "status": "successful"
    }
}

Filesystem Health Check

Verifies filesystem health, ensuring devices have adequate storage and identifying storage-related anomalies.

Example Playbook

- name: Monitor filesystem status
  ansible.builtin.include_role:
    name: network.healthchecks.filesystem
  vars:
    ansible_network_os: cisco.ios.ios
    filesystem_threshold: 80
    ignore_errors: false
  register: filesystem_result

- name: Display filesystem health check results
  ansible.builtin.debug:
    var: filesystem_result.health_checks

Health Check Output

{
    "health_checks": {
        "filesystem_utilization": {
            "check_status": "successful",
            "current_utilization": 45,
            "threshold": 80
        },
        "filesystem_status_summary": {
            "total_mb": 1000,
            "used_mb": 450,
            "free_mb": 550
        },
        "status": "successful"
    }
}

Automation Recommendations

Implementation Best Practices

Regular Health Checks: Schedule automated health checks at regular intervals (e.g., daily or weekly)
Threshold Configuration: Set appropriate thresholds based on your network's requirements
Error Handling: Use ignore_errors judiciously to prevent playbook failures
Result Processing: Implement result processing to trigger alerts or notifications
Documentation: Maintain documentation of health check configurations and thresholds

Example Playbook Structure

- name: Network Health Check Playbook
  hosts: network_devices
  gather_facts: false
  tasks:
    - name: Run CPU health check
      ansible.builtin.include_role:
        name: network.healthchecks.cpu
      vars:
        cpu_threshold: 80
      register: cpu_result

    - name: Run memory health check
      ansible.builtin.include_role:
        name: network.healthchecks.memory
      vars:
        memory_threshold: 80
        min_free_memory: 100
      register: memory_result

    - name: Run uptime health check
      ansible.builtin.include_role:
        name: network.healthchecks.uptime
      vars:
        uptime_threshold_minutes: 1440
      register: uptime_result

    - name: Process health check results
      ansible.builtin.debug:
        msg: |
          CPU Status: {{ cpu_result.health_checks.status }}
          Memory Status: {{ memory_result.health_checks.status }}
          Uptime Status: {{ uptime_result.health_checks.status }}

Feedback and Support

We encourage the networking field team to:

Review these roles and provide feedback
Test the roles in your environment
Share use cases and requirements
Report any issues or suggest improvements

For support and feedback, please:

Open issues in the GitHub repository
Contact the Ansible Network Content Team
Join the Ansible Network Community

Ruchip16/# Network Health Check Validated Content - Automation Overview.md

Network Health Check Validated Content - Automation Overview

Introduction

Validated Collection Roles Overview

CPU Health Check

Example Playbook

Health Check Output

Memory Health Check

Example Playbook

Health Check Output

Uptime Health Check

Example Playbook

Health Check Output

Environment Health Check

Example Playbook

Health Check Output

Filesystem Health Check

Example Playbook

Health Check Output

Automation Recommendations

Implementation Best Practices

Example Playbook Structure

Feedback and Support