Skip to content

Instantly share code, notes, and snippets.

@sebbbi
sebbbi / FastUniformLoadWithWaveOps.txt
Last active April 26, 2025 10:27
Fast uniform load with wave ops (up to 64x speedup)
In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader
group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile.
Simplified HLSL code looks like this:
Buffer<float4> lightDatas;
Texture2D<uint2> lightStartCounts;
RWTexture2D<float4> output;
[numthreads(8, 8, 1)]