Reduce misleading SLEEP blocks by correlating work that crosses BEAM process boundaries, especially GenServer.call/3, while preserving the existing per-process stack profiler.
- Keep current per-process stack traces for compatibility.
- Attribute blocked caller time to downstream process work when possible.
- Represent cross-process activity as a trace tree/span graph, not as a fake merged call stack.
- Keep unattributed off-CPU time visible as fallback wait time instead of overemphasized
SLEEPflamegraph blocks.
- Build a small spike using
:seq_tracearoundGenServer.call/3. - Confirm that a root process can propagate a trace token across request/reply message flow.
- Verify we can reliably identify caller, callee, send time, receive time, reply time, and resume time.
- Abort this path if
:seq_traceis too brittle or too expensive in practice.
- Keep existing
:call,:return_to, and:runningtracing. - Add the minimum message/process tracing needed for correlation, likely
:send,'receive', and:procs. - Capture enough metadata to infer edges such as:
- caller sends request
- callee receives request
- callee executes work
- callee replies
- caller resumes
- Keep
FlameOn.Client.TraceSessionfocused on a single process timeline. - Add a new
TraceGraphSessionGenServer to own one logical request trace. - Have
TraceGraphSessionmanage:- root trace metadata
- per-process trace sessions
- caller/callee relationships
- wait edges and timing
- graph finalization and shipping
- Limit the first implementation to
GenServer.call/3-style request/reply behavior. - When the root process blocks on another process, attach the callee work to the same trace id.
- Model the caller's blocked interval as explicit wait time on a known callee rather than anonymous
SLEEP. - Defer arbitrary message-passing patterns until the request/reply case is stable.
- Introduce a graph/span representation alongside collapsed stacks.
- Each process span should include fields such as:
trace_idspan_idparent_span_idpidstarted_atended_atself_uswait_uswaiting_on_pidchildren
- Preserve current collapsed stack output so existing shippers and consumers do not break.
- Finalize each per-process stack as today.
- Let the graph session stitch those process-local timelines together.
- If a caller wait interval is fully explained by a traced callee, mark that interval as attributed wait.
- If no callee can be correlated, retain it as unattributed off-CPU wait.
- Do not pretend cross-process work is one continuous stack.
- Render linked process spans in the UI or agent-facing output.
- Change default analysis behavior to de-emphasize
SLEEP:- hide synthetic sleep frames when fully attributed to downstream work
- keep unattributed wait time visible as
WAITINGorOFF_CPU - optionally expose a toggle to include raw sleep frames
- Add tests before implementation for:
- existing single-process traces remaining unchanged
GenServer.call/3correlation between caller wait and callee execution- nested hops like
A -> B -> C - unattributed wait remaining visible
- crashes or exits in downstream processes
- threshold and sampling behavior across graph traces
- Add focused integration tests under
test/flame_on/client/for both per-process and graph-level behavior.
- Make cross-process tracing opt-in via config.
- Limit max descendant processes per root trace.
- Limit trace lifetime and fanout.
- Continue sampling before enabling graph capture.
- Measure mailbox growth, trace volume, and overhead under concurrent load.
- Update
README.mdto describe the distinction between:- per-process stack tracing
- cross-process causal tracing
- Document that process boundaries produce linked spans, not continued stack frames.
- Explain how unattributed wait time is represented.
- Prototype
:seq_tracecorrelation aroundGenServer.call/3. - Prove sender/receiver stitching in tests.
- Introduce
TraceGraphSessionand per-process session registry. - Export graph data without changing current collapsed stack shipping.
- Rework
SLEEPhandling once attribution is trustworthy.
:seq_trace is the biggest uncertainty. It may be subtle to operate correctly across real-world libraries and may add enough complexity or overhead that explicit app-level correlation becomes a better fallback.
If VM-level correlation is not viable:
- keep the current per-process profiler
- propagate trace ids explicitly across known boundaries
- start child process traces under the same logical trace
- ship a stitched span graph built from app-level correlation instead of pure VM trace events