Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save Ruchip16/c778601b74963bd0a6bdba2192a8eb6d to your computer and use it in GitHub Desktop.
Save Ruchip16/c778601b74963bd0a6bdba2192a8eb6d to your computer and use it in GitHub Desktop.

Network Health Check Validated Content - Automation Overview

Introduction

To streamline network operations and accelerate incident response, we have developed a validated Ansible content collection that automates routine health-checks on network devices. This collection provides modular roles designed to gather critical health information across multiple network platforms (Cisco IOS-XR, IOS-XE, NX-OS, and Arista EOS).

Validated Collection Roles Overview

CPU Health Check

Monitors CPU utilization across devices, ensuring resources are optimized and identifying potential performance bottlenecks proactively.

Example Playbook

- name: Monitor CPU utilization
  ansible.builtin.include_role:
    name: network.healthchecks.cpu
  vars:
    ansible_network_os: cisco.ios.ios
    cpu_threshold: 80
    ignore_errors: false
  register: cpu_result

- name: Display CPU health check results
  ansible.builtin.debug:
    var: cpu_result.health_checks

Health Check Output

{
    "health_checks": {
        "cpu_utilization": {
            "check_status": "successful",
            "current_utilization": 45,
            "threshold": 80
        },
        "cpu_status_summary": {
            "five_minute": 45,
            "five_seconds": 40,
            "one_minute": 42
        },
        "status": "successful"
    }
}

Memory Health Check

Checks memory usage to detect and resolve potential memory leaks or over-utilization before they impact network stability.

Example Playbook

- name: Monitor memory utilization
  ansible.builtin.include_role:
    name: network.healthchecks.memory
  vars:
    ansible_network_os: cisco.ios.ios
    memory_threshold: 80
    min_free_memory: 100
    min_buffers: 50
    min_cache: 50
    ignore_errors: false
  register: memory_result

- name: Display memory health check results
  ansible.builtin.debug:
    var: memory_result.health_checks

Health Check Output

{
    "health_checks": {
        "memory_utilization": {
            "check_status": "successful",
            "current_utilization": 45,
            "threshold": 80
        },
        "memory_free": {
            "check_status": "successful",
            "current_free": 150,
            "min_free": 100
        },
        "memory_buffers": {
            "check_status": "successful",
            "current_buffers": 75,
            "min_buffers": 50
        },
        "memory_cache": {
            "check_status": "successful",
            "current_cache": 60,
            "min_cache": 50
        },
        "memory_status_summary": {
            "total_mb": 1000,
            "used_mb": 450,
            "free_mb": 550,
            "buffers_mb": 75,
            "cache_mb": 60
        },
        "status": "successful"
    }
}

Uptime Health Check

Monitors system uptime to identify unexpected reboots and ensure reliable network availability.

Example Playbook

- name: Monitor system uptime
  ansible.builtin.include_role:
    name: network.healthchecks.uptime
  vars:
    ansible_network_os: cisco.ios.ios
    uptime_threshold_minutes: 1440  # 24 hours
    ignore_errors: false
  register: uptime_result

- name: Display uptime health check results
  ansible.builtin.debug:
    var: uptime_result.health_checks

Health Check Output

{
    "health_checks": {
        "uptime": {
            "check_status": "successful",
            "current_uptime": 5760,
            "min_uptime": 1440
        },
        "uptime_status_summary": {
            "weeks": 3,
            "days": 3,
            "hours": 22,
            "minutes": 36
        },
        "status": "successful"
    }
}

Environment Health Check

Provides comprehensive monitoring of power supplies, temperature sensors, and fan statuses, crucial for avoiding hardware failures.

Example Playbook

- name: Monitor environment status
  ansible.builtin.include_role:
    name: network.healthchecks.environment
  vars:
    ansible_network_os: cisco.ios.ios
    temperature_threshold: 40
    ignore_errors: false
  register: environment_result

- name: Display environment health check results
  ansible.builtin.debug:
    var: environment_result.health_checks

Health Check Output

{
    "health_checks": {
        "temperature": {
            "check_status": "successful",
            "current_temperature": 35,
            "threshold": 40
        },
        "power_supply": {
            "check_status": "successful",
            "status": "ok"
        },
        "fan_status": {
            "check_status": "successful",
            "status": "ok"
        },
        "status": "successful"
    }
}

Filesystem Health Check

Verifies filesystem health, ensuring devices have adequate storage and identifying storage-related anomalies.

Example Playbook

- name: Monitor filesystem status
  ansible.builtin.include_role:
    name: network.healthchecks.filesystem
  vars:
    ansible_network_os: cisco.ios.ios
    filesystem_threshold: 80
    ignore_errors: false
  register: filesystem_result

- name: Display filesystem health check results
  ansible.builtin.debug:
    var: filesystem_result.health_checks

Health Check Output

{
    "health_checks": {
        "filesystem_utilization": {
            "check_status": "successful",
            "current_utilization": 45,
            "threshold": 80
        },
        "filesystem_status_summary": {
            "total_mb": 1000,
            "used_mb": 450,
            "free_mb": 550
        },
        "status": "successful"
    }
}

Automation Recommendations

Implementation Best Practices

  1. Regular Health Checks: Schedule automated health checks at regular intervals (e.g., daily or weekly)
  2. Threshold Configuration: Set appropriate thresholds based on your network's requirements
  3. Error Handling: Use ignore_errors judiciously to prevent playbook failures
  4. Result Processing: Implement result processing to trigger alerts or notifications
  5. Documentation: Maintain documentation of health check configurations and thresholds

Example Playbook Structure

- name: Network Health Check Playbook
  hosts: network_devices
  gather_facts: false
  tasks:
    - name: Run CPU health check
      ansible.builtin.include_role:
        name: network.healthchecks.cpu
      vars:
        cpu_threshold: 80
      register: cpu_result

    - name: Run memory health check
      ansible.builtin.include_role:
        name: network.healthchecks.memory
      vars:
        memory_threshold: 80
        min_free_memory: 100
      register: memory_result

    - name: Run uptime health check
      ansible.builtin.include_role:
        name: network.healthchecks.uptime
      vars:
        uptime_threshold_minutes: 1440
      register: uptime_result

    - name: Process health check results
      ansible.builtin.debug:
        msg: |
          CPU Status: {{ cpu_result.health_checks.status }}
          Memory Status: {{ memory_result.health_checks.status }}
          Uptime Status: {{ uptime_result.health_checks.status }}

Feedback and Support

We encourage the networking field team to:

  1. Review these roles and provide feedback
  2. Test the roles in your environment
  3. Share use cases and requirements
  4. Report any issues or suggest improvements

For support and feedback, please:

  • Open issues in the GitHub repository
  • Contact the Ansible Network Content Team
  • Join the Ansible Network Community
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment