Skip to content

Instantly share code, notes, and snippets.

@alexarmbr
Last active April 3, 2025 18:27
Show Gist options
  • Save alexarmbr/4d78cb9a99837f2938b9fc0353bc6e84 to your computer and use it in GitHub Desktop.
Save alexarmbr/4d78cb9a99837f2938b9fc0353bc6e84 to your computer and use it in GitHub Desktop.
gpu profiling

Nsight Systems

add torch.cuda.cudart().cudaProfilerStart() and torch.cuda.cudart().cudaProfilerStop() where profiling should start and stop. launch profiler with

CUDA_VISIBLE_DEVICES=0,1,2,3
nsys profile \
-w true \
-t cuda,nvtx,osrt,cudnn,cublas \
-s cpu \
--capture-range=cudaProfilerApi \
--capture-range-end=stop \
--cudabacktrace=true \
--gpu-metrics-devices=cuda-visible \
--gpu-metrics-set=gh100 \
-x true \
-f true \
-o flux-schnell \
-e SOME_ENV_VAR=123
python flux-schnell.py

add nvtx traces with torch.cuda.nvtx.range(f"step_{self.step_count}")

Torch Profiler

Starting to like this one better, easier to associate profiling results with general regions of the model. Spits out an enormous json trace, but you can view in vscode with the tensorboard plugin, so no need to download them from the server. if you download it, view the trace using chrome://tracing/ Tutorial here

# do nothing for five steps, run the profiler but don't record metrics on the next step,
# and collect metrics for the next three steps after that, and repeat all of this 0
# times after the 1st time. Will write all profiling data to disk after
# (wait + warmup + active) * (repeat + 1) steps have passed, this may take a long time
# but overhead during running of the profiler is minimal.
prof = torch.profiler.profile(
    schedule=torch.profiler.schedule(wait=5, warmup=1, active=3, repeat = 0),
    on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/mymodel'),
    record_shapes=True,
    with_stack=True)

prof.start()

for i in range(10):
  
  # tell the profiler when a step has passed
  prof.step()
  y = model(x)

prof.stop()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment