You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This document provides an overview of all metrics generated by the llm-d components.
Overview
The llm-d system uses Prometheus as the primary metrics collection framework, with metrics covering inference performance, resource utilization,
error rates, and energy consumption across multiple components.
Component Metrics
1. llm-d KV Cache Manager
Status: No Prometheus metrics currently implemented
The KV Cache Manager component does not currently expose Prometheus metrics directly. However, KV cache-related metrics are available through the Gateway API Inference Extension.
Metrics Location: All KV cache-related metrics are defined in gateway-api-inference-extension/pkg/epp/metrics/metrics.go
KV Cache Utilization Metrics
These metrics are exposed through the Gateway API Inference Extension but relate to KV cache functionality:
Metric Name
Type
Description
Labels
inference_pool_average_kv_cache_utilization
Gauge
Average KV cache utilization per pool
name
inference_extension_prefix_indexer_size
Gauge
Size of the prefix indexer
-
inference_extension_prefix_indexer_hit_ratio
Histogram
Cache hit ratio distribution
-
inference_extension_prefix_indexer_hit_bytes
Histogram
Cache hit length distribution
-
2. llm-d Inference Scheduler
Metrics Location: All inference scheduler metrics are defined in gateway-api-inference-extension/pkg/epp/metrics/metrics.go
The Inference Scheduler provides scheduling and performance metrics:
The scheduler scrapes metrics from individual inference server pods:
Metric Category
Description
Queue Size Metrics
Number of requests waiting in queue
KV Cache Utilization
Percentage of KV cache being used
LoRA Adapter Metrics
Running and waiting LoRA adapters
Maximum Active Models
Capacity information
3. llm-d Routing Sidecar
Status: No Prometheus metrics currently implemented
The routing sidecar component does not currently expose any Prometheus metrics. The component would need Prometheus client libraries added to implement metrics for routing operations such as:
Request routing latency
Routing success/failure rates
Target selection metrics
Connection pool utilization
4. Gateway API Inference Extension
Metrics Location: The Gateway API Inference Extension metrics are primarily defined in two files:
Main metrics: gateway-api-inference-extension/pkg/epp/metrics/metrics.go (most metrics)