Skip to content

Instantly share code, notes, and snippets.

@methodmissing
Created December 7, 2018 01:01
Show Gist options
  • Save methodmissing/1a17ff277d4e7696e139e92f0c508bd9 to your computer and use it in GitHub Desktop.
Save methodmissing/1a17ff277d4e7696e139e92f0c508bd9 to your computer and use it in GitHub Desktop.
lourens@CarbonX1:~/src/optcarrot$ perf record -e cycles:u -j any,u -o perf.data -- ~/src/ruby/ruby/ruby -I~/src/ruby/ruby/lib -I~/src/ruby/ruby/. -I~/src/ruby/ruby/.ext/x86_64-linux -r./tools/shim.rb bin/optcarrot --benchmark --frames 10000 examples/Lan_Master.nes
fps: 41.020819873462244
checksum: 60838
[ perf record: Woken up 2840 times to write data ]
[kernel.kallsyms] with build id c8b95745cc1ba18edca26befae83a11e956471d1 not found, continuing without symbols
[ perf record: Captured and wrote 710.340 MB perf.data (908934 samples) ]
lourens@CarbonX1:~/src/optcarrot$ perf2bolt -p perf.data -o perf.fdata ~/src/ruby/ruby/ruby
PERF2BOLT: Starting data aggregation job for perf.data
PERF2BOLT: spawning perf job to read branch events
PERF2BOLT: spawning perf job to read mem events
PERF2BOLT: spawning perf job to read process events
PERF2BOLT: spawning perf job to read task events
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x600000, offset 0x600000
BOLT-INFO: enabling relocation mode
BOLT-INFO: binary build-id is: c12ca900458fb079a177c840b82858846ec73194
PERF2BOLT: spawning perf job to read buildid list
PERF2BOLT: matched build-id and file name
BOLT-INFO: forcing -jump-tables=move as PIC jump table was detected in function error_handle/eval.c/1(*2)
PERF2BOLT: waiting for perf mmap events collection to finish...
PERF2BOLT: parsing perf-script mmap events output
PERF2BOLT: waiting for perf task events collection to finish...
PERF2BOLT: parsing perf-script task events output
PERF2BOLT: input binary is associated with 1 PID(s)
PERF2BOLT: waiting for perf events collection to finish...
PERF2BOLT: aggregating branch events...
PERF2BOLT: read 908899 samples and 29084480 LBR entries
PERF2BOLT: 35 samples (0.0%) were ignored
PERF2BOLT: traces mismatching disassembled function contents: 201334 (0.7%)
PERF2BOLT: out of range traces involving unknown regions: 72349 (0.3%)
PERF2BOLT: wrote 6916 objects and 0 memory objects to perf.fdata
lourens@CarbonX1:~/src/optcarrot$ llvm-bolt ~/src/ruby/ruby/ruby -o ~/src/ruby/ruby/ruby.bolt -data=perf.fdata -reorder-blocks=cache+ -reorder-functions=hfsort+ -split-functions=3 -split-all-cold -split-eh -dyno-stats -align-blocks -align-macro-fusion=hot -peepholes=all -inline-memcpy -print-cache-metrics -frame-opt=hot -optimize-bodyless-functions
BOLT-INFO: Target architecture: x86_64
BOLT-INFO: shared object or position-independent executable detected
BOLT-INFO: first alloc address is 0x0
BOLT-INFO: creating new program header table at address 0x600000, offset 0x600000
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move as PIC jump table was detected in function error_handle/eval.c/1(*2)
BOLT-INFO: 508 functions out of 5506 simple functions (9.2%) have non-empty execution profile.
BOLT-INFO: 39 non-simple function(s) have profile.
BOLT-INFO: profile for 1 objects was ignored
BOLT-INFO: the input contains 779 (dynamic count : 57698) missed opportunities for macro-fusion optimization. Will fix instances on a hot path.
BOLT-INFO: removed 442 'repz' prefixes with estimated execution count of 131015 times.
BOLT-INFO: inlined 342 memcpy() calls. The calls were executed 72 times based on profile.
BOLT-INFO: Peephole: 0 instructions shortened.
BOLT-INFO: Peephole: 6 double jumps patched.
BOLT-INFO: Peephole: 38 tail call traps inserted.
BOLT-INFO: Peephole: 1 useless conditional branches removed.
BOLT-INFO: optimized 124 redirect call sites to eliminate 23 dynamic calls.
BOLT-INFO: basic block reordering modified layout of 337 (5.77%) functions
BOLT-INFO: Peephole: 0 instructions shortened.
BOLT-INFO: Peephole: 0 double jumps patched.
BOLT-INFO: Peephole: 0 tail call traps inserted.
BOLT-INFO: Peephole: 0 useless conditional branches removed.
BOLT-INFO: UCE removed 0 blocks and 0 bytes of code.
BOLT-INFO: running hfsort+ for 522 functions
BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP:
8124979 : executed forward branches
2554814 : taken forward branches
2209363 : executed backward branches
1626342 : taken backward branches
551613 : executed unconditional branches
1099524 : all function calls
198786 : indirect calls
49281 : PLT calls
90043992 : executed instructions
24764541 : executed load instructions
19173048 : executed store instructions
232 : taken jump table branches
10885955 : total branches
4732769 : taken branches
6153186 : non-taken conditional branches
4181156 : taken conditional branches
10334342 : all conditional branches
10104284 : executed forward branches (+24.4%)
210408 : taken forward branches (-91.8%)
230058 : executed backward branches (-89.6%)
180792 : taken backward branches (-88.9%)
18565 : executed unconditional branches (-96.6%)
1099382 : all function calls (-0.0%)
198716 : indirect calls (-0.0%)
49211 : PLT calls (-0.1%)
89581363 : executed instructions (-0.5%)
24764471 : executed load instructions (-0.0%)
19173048 : executed store instructions (=)
232 : taken jump table branches (=)
10352907 : total branches (-4.9%)
409765 : taken branches (-91.3%)
9943142 : non-taken conditional branches (+61.6%)
391200 : taken conditional branches (-90.6%)
10334342 : all conditional branches (=)
BOLT-INFO: SCTC: patched 151 tail calls (137 forward) tail calls (14 backward) from a total of 151 while removing 3 double jumps and removing 113 basic blocks totalling 565 bytes of code. CTCs total execution count is 146094 and the number of times CTCs are taken is 77520.
BOLT-INFO: FOP optimized 0 redundant load(s) and 0 unused store(s)
BOLT-INFO: FOP changed 0 load(s) to use a register instead of a stack access, and 0 to use an immediate.
BOLT-INFO: FOP deleted 0 load(s) and 0 store(s).
BOLT-INFO: FRAME ANALYSIS: 339 function(s) (3.3% dyn cov) were not optimized.
BOLT-INFO: FRAME ANALYSIS: 2443 function(s) (23.3% dyn cov) could not have its frame indices restored.
BOLT-INFO: Shrink wrapping moved 4 spills inserting load/stores and 2 spills inserting push/pops
BOLT-INFO: Allocation combiner: 9 empty spaces coalesced.
BOLT-INFO: cache metrics after emitting functions:
There are 5845 functions; 522 (8.93%) are in the hot section, 547 (9.36%) have profile
There are 121954 basic blocks; 11012 (9.03%) are in the hot section
Hot code takes -nan% of binary (0 bytes out of 0, 0.00 huge pages)
Expected i-TLB cache hit ratio: 100.00%
TSP score: 71233725
ExtTSP score: 71233725
BOLT-INFO: setting _end to 0xcd7860
BOLT-INFO: setting _end to 0xcd7860
BOLT-INFO: patched build-id (flipped last bit)
lourens@CarbonX1:~/src/optcarrot$ benchmark-driver -e "bolt::~/src/ruby/ruby/ruby.bolt -I~/src/ruby/ruby/lib -I~/src/ruby/ruby/. -I~/src/ruby/ruby/.ext/x86_64-linux --disable-gems" -e "trunk::~/src/ruby/ruby/ruby -I~/src/ruby/ruby/lib -I~/src/ruby/ruby/. -I~/src/ruby/ruby/.ext/x86_64-linux --disable-gems" -v --repeat-count 24 benchmark.yml
bolt: ruby 2.6.0dev (2018-12-07 trunk 66259) [x86_64-linux]
trunk: ruby 2.6.0dev (2018-12-07 trunk 66259) [x86_64-linux]
Calculating -------------------------------------
bolt trunk
optcarrot 47.350 45.496 fps
Comparison:
optcarrot
bolt: 47.4 fps
trunk: 45.5 fps - 1.04x slower
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment