Skip to content

Instantly share code, notes, and snippets.

@Pikachuxxxx
Created May 31, 2025 18:59
Show Gist options
  • Save Pikachuxxxx/f81854419012d514777a17c980d21450 to your computer and use it in GitHub Desktop.
Save Pikachuxxxx/f81854419012d514777a17c980d21450 to your computer and use it in GitHub Desktop.
Fishbone diagrams to help debug CPU/GPU bound issues based on Bruce Waggoner saving voyager 1 presentation
// Inspired By: https://www.youtube.com/watch?v=E6TS1c8KWFA
CPU‐Bound Performance Issues
\
\ **Draw Call Count**
\ \
\ \ Too many state changes per frame
\ \ Small batches (lots of tiny draw calls)
\
\ **Synchronization**
\ \
\ \ CPU stalls waiting on GPU (vkQueueWaitIdle, fences)
\ \ Overly broad pipeline barriers
\
\ **Resource Management Overhead**
\ \
\ \ Frequent vkUpdateDescriptorSets / rebinding
\ \ Excessive vkMapMemory / vkUnmapMemory
\ \ **Constant / Push‐Constant Writes**
\ \ \
\ \ \ vkCmdPushConstants calls in inner loop
\ \ \ Uniform‐buffer ring stalls (driver waits for in‐flight data)
\
\ **Wine / Translation Overhead**
\ \
\ \ Syscall marshalling cost
\ \ Translation of D3D calls to Vulkan
\
\ **Driver Validation / Tracking**
\ \
\ \ Command buffer recording validation
\ \ State‐tracking bookkeeping overhead
\
\ **Poor Command Buffer Recording**
\
\ Non‐batched barrier/transition calls
================================================================================================
GPU‐Bound Performance Issues
\
\ **Shader Complexity**
\ \
\ \ Poor LOD / ALU‐heavy ops
\ \ Divergence / branching
\
\ **Memory Bandwidth**
\ \
\ \ Texture sampling stalls
\ \ VRAM contention / tiling issues
\
\ **Overdraw & Blending**
\ \
\ \ Excessive transparency / blending
\ \ Too much overdraw (fill‐rate bound)
\
\ **Rasterization Overhead**
\ \
\ \ Complex multisample / anti‐aliasing
\ \ Heavy fragment shading per pixel
\
\ **Synchronization**
\ \
\ \ GPU waits on CPU (fences, semaphores)
\ \ Render‐pass / subpass barriers
\
\ **Driver Resource Mgmt**
\ \
\ \ Inefficient resource transitions
\ \ Poor pipeline‐state creation / hashing
\ \ Descriptor‐set update overhead
\
\ **Constant / Push‐Constant Updates**
\ \
\ \ Push‐constant cache‐flush cost
\ \ GPU stalls waiting on new uniform data
\ \ Hidden pipeline stalls from small constant changes
\
\ **Inefficient Transitions**
(
e.g. too‐fine‐grained vkCmdPipelineBarrier
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment