Minimal reproductions of deadlock issues when using OTLP exporters (tonic/gRPC) with the OpenTelemetry Rust SDK on constrained tokio runtimes.
Related issues:
- open-telemetry/opentelemetry-rust#2802
- open-telemetry/opentelemetry-rust#2715
- open-telemetry/opentelemetry-rust#2539
- open-telemetry/opentelemetry-rust#2071
Tested with opentelemetry_sdk v0.31.0, opentelemetry-otlp v0.31.0.
The default thread-based processors (PeriodicReader, BatchSpanProcessor, BatchLogProcessor)
use dedicated OS threads that call futures_executor::block_on(exporter.export(...)).
When the exporter is tonic/gRPC, the export future needs the tokio reactor to drive HTTP/2 IO.
The reactor is driven by tokio's worker threads. If all worker threads are blocked (by
force_flush()/shutdown() calls), the reactor stalls and the export can never complete → deadlock.
No experimental features or special configuration is needed to hit this. The default code path with the published crates is affected.
cargo run --example periodic_reader_current_thread
Uses current_thread runtime (same as #[tokio::test] default). Calls force_flush() which
blocks the only tokio thread. The worker thread's tonic export can't complete because the
reactor is stalled. Hangs forever.
cargo run --example periodic_reader_multi_thread_1_worker
Simulates a 1-vCPU Kubernetes pod. Calls force_flush() from inside tokio::spawn, blocking
the only worker thread. The entire runtime freezes — even tokio::time::sleep can't fire.
Hangs forever.
Note: rt-tokio-current-thread does NOT help here — the runtime is multi_thread flavor.
cargo run --example batch_span_processor_current_thread
Same root cause as #1 but for spans. The BatchSpanProcessor has an internal 5-second timeout
on force_flush, so it returns Err(Timeout(5s)) instead of hanging forever. However, the
worker thread is permanently stuck — it will never recover.
cargo run --example working_multi_thread
With multiple worker threads, other threads can drive the reactor while one is blocked. Completes immediately (with a connection error since no collector is running, but no hang).