Created
May 31, 2025 18:59
-
-
Save Pikachuxxxx/f81854419012d514777a17c980d21450 to your computer and use it in GitHub Desktop.
Fishbone diagrams to help debug CPU/GPU bound issues based on Bruce Waggoner saving voyager 1 presentation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Inspired By: https://www.youtube.com/watch?v=E6TS1c8KWFA | |
CPU‐Bound Performance Issues | |
\ | |
\ **Draw Call Count** | |
\ \ | |
\ \ Too many state changes per frame | |
\ \ Small batches (lots of tiny draw calls) | |
\ | |
\ **Synchronization** | |
\ \ | |
\ \ CPU stalls waiting on GPU (vkQueueWaitIdle, fences) | |
\ \ Overly broad pipeline barriers | |
\ | |
\ **Resource Management Overhead** | |
\ \ | |
\ \ Frequent vkUpdateDescriptorSets / rebinding | |
\ \ Excessive vkMapMemory / vkUnmapMemory | |
\ \ **Constant / Push‐Constant Writes** | |
\ \ \ | |
\ \ \ vkCmdPushConstants calls in inner loop | |
\ \ \ Uniform‐buffer ring stalls (driver waits for in‐flight data) | |
\ | |
\ **Wine / Translation Overhead** | |
\ \ | |
\ \ Syscall marshalling cost | |
\ \ Translation of D3D calls to Vulkan | |
\ | |
\ **Driver Validation / Tracking** | |
\ \ | |
\ \ Command buffer recording validation | |
\ \ State‐tracking bookkeeping overhead | |
\ | |
\ **Poor Command Buffer Recording** | |
\ | |
\ Non‐batched barrier/transition calls | |
================================================================================================ | |
GPU‐Bound Performance Issues | |
\ | |
\ **Shader Complexity** | |
\ \ | |
\ \ Poor LOD / ALU‐heavy ops | |
\ \ Divergence / branching | |
\ | |
\ **Memory Bandwidth** | |
\ \ | |
\ \ Texture sampling stalls | |
\ \ VRAM contention / tiling issues | |
\ | |
\ **Overdraw & Blending** | |
\ \ | |
\ \ Excessive transparency / blending | |
\ \ Too much overdraw (fill‐rate bound) | |
\ | |
\ **Rasterization Overhead** | |
\ \ | |
\ \ Complex multisample / anti‐aliasing | |
\ \ Heavy fragment shading per pixel | |
\ | |
\ **Synchronization** | |
\ \ | |
\ \ GPU waits on CPU (fences, semaphores) | |
\ \ Render‐pass / subpass barriers | |
\ | |
\ **Driver Resource Mgmt** | |
\ \ | |
\ \ Inefficient resource transitions | |
\ \ Poor pipeline‐state creation / hashing | |
\ \ Descriptor‐set update overhead | |
\ | |
\ **Constant / Push‐Constant Updates** | |
\ \ | |
\ \ Push‐constant cache‐flush cost | |
\ \ GPU stalls waiting on new uniform data | |
\ \ Hidden pipeline stalls from small constant changes | |
\ | |
\ **Inefficient Transitions** | |
( | |
e.g. too‐fine‐grained vkCmdPipelineBarrier | |
) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment