This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| 1) Clone and build WebKit | |
| git clone https://github.com/WebKit/WebKit.git WebKit | |
| cd WebKit | |
| Tools/Scripts/build-webkit -cmakeargs="-DENABLE_WEBGPU_BY_DEFAULT=1" --debug | |
| 2) Run your app | |
| __XPC_METAL_CAPTURE_ENABLED=1 Tools/Scripts/run-minibrowser --debug --url http://localhost:5000/index.html#/loaders/gsplat |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # This isn't supposed to run as a bash script, i named it with ".sh" for syntax highlighting. | |
| # https://developer.nvidia.com/nsight-systems | |
| # https://docs.nvidia.com/nsight-systems/profiling/index.html | |
| # My preferred nsys (command line executable used to create profiles) commands | |
| # | |
| # In your script, write | |
| # torch.cuda.nvtx.range_push("region name") | |
| # ... |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Latency Comparison Numbers (~2012) | |
| ---------------------------------- | |
| L1 cache reference 0.5 ns | |
| Branch mispredict 5 ns | |
| L2 cache reference 7 ns 14x L1 cache | |
| Mutex lock/unlock 25 ns | |
| Main memory reference 100 ns 20x L2 cache, 200x L1 cache | |
| Compress 1K bytes with Zippy 3,000 ns 3 us | |
| Send 1K bytes over 1 Gbps network 10,000 ns 10 us | |
| Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD |