Created
December 7, 2018 01:01
-
-
Save methodmissing/1a17ff277d4e7696e139e92f0c508bd9 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
lourens@CarbonX1:~/src/optcarrot$ perf record -e cycles:u -j any,u -o perf.data -- ~/src/ruby/ruby/ruby -I~/src/ruby/ruby/lib -I~/src/ruby/ruby/. -I~/src/ruby/ruby/.ext/x86_64-linux -r./tools/shim.rb bin/optcarrot --benchmark --frames 10000 examples/Lan_Master.nes | |
fps: 41.020819873462244 | |
checksum: 60838 | |
[ perf record: Woken up 2840 times to write data ] | |
[kernel.kallsyms] with build id c8b95745cc1ba18edca26befae83a11e956471d1 not found, continuing without symbols | |
[ perf record: Captured and wrote 710.340 MB perf.data (908934 samples) ] | |
lourens@CarbonX1:~/src/optcarrot$ perf2bolt -p perf.data -o perf.fdata ~/src/ruby/ruby/ruby | |
PERF2BOLT: Starting data aggregation job for perf.data | |
PERF2BOLT: spawning perf job to read branch events | |
PERF2BOLT: spawning perf job to read mem events | |
PERF2BOLT: spawning perf job to read process events | |
PERF2BOLT: spawning perf job to read task events | |
BOLT-INFO: Target architecture: x86_64 | |
BOLT-INFO: shared object or position-independent executable detected | |
BOLT-INFO: first alloc address is 0x0 | |
BOLT-INFO: creating new program header table at address 0x600000, offset 0x600000 | |
BOLT-INFO: enabling relocation mode | |
BOLT-INFO: binary build-id is: c12ca900458fb079a177c840b82858846ec73194 | |
PERF2BOLT: spawning perf job to read buildid list | |
PERF2BOLT: matched build-id and file name | |
BOLT-INFO: forcing -jump-tables=move as PIC jump table was detected in function error_handle/eval.c/1(*2) | |
PERF2BOLT: waiting for perf mmap events collection to finish... | |
PERF2BOLT: parsing perf-script mmap events output | |
PERF2BOLT: waiting for perf task events collection to finish... | |
PERF2BOLT: parsing perf-script task events output | |
PERF2BOLT: input binary is associated with 1 PID(s) | |
PERF2BOLT: waiting for perf events collection to finish... | |
PERF2BOLT: aggregating branch events... | |
PERF2BOLT: read 908899 samples and 29084480 LBR entries | |
PERF2BOLT: 35 samples (0.0%) were ignored | |
PERF2BOLT: traces mismatching disassembled function contents: 201334 (0.7%) | |
PERF2BOLT: out of range traces involving unknown regions: 72349 (0.3%) | |
PERF2BOLT: wrote 6916 objects and 0 memory objects to perf.fdata | |
lourens@CarbonX1:~/src/optcarrot$ llvm-bolt ~/src/ruby/ruby/ruby -o ~/src/ruby/ruby/ruby.bolt -data=perf.fdata -reorder-blocks=cache+ -reorder-functions=hfsort+ -split-functions=3 -split-all-cold -split-eh -dyno-stats -align-blocks -align-macro-fusion=hot -peepholes=all -inline-memcpy -print-cache-metrics -frame-opt=hot -optimize-bodyless-functions | |
BOLT-INFO: Target architecture: x86_64 | |
BOLT-INFO: shared object or position-independent executable detected | |
BOLT-INFO: first alloc address is 0x0 | |
BOLT-INFO: creating new program header table at address 0x600000, offset 0x600000 | |
BOLT-INFO: enabling relocation mode | |
BOLT-INFO: forcing -jump-tables=move as PIC jump table was detected in function error_handle/eval.c/1(*2) | |
BOLT-INFO: 508 functions out of 5506 simple functions (9.2%) have non-empty execution profile. | |
BOLT-INFO: 39 non-simple function(s) have profile. | |
BOLT-INFO: profile for 1 objects was ignored | |
BOLT-INFO: the input contains 779 (dynamic count : 57698) missed opportunities for macro-fusion optimization. Will fix instances on a hot path. | |
BOLT-INFO: removed 442 'repz' prefixes with estimated execution count of 131015 times. | |
BOLT-INFO: inlined 342 memcpy() calls. The calls were executed 72 times based on profile. | |
BOLT-INFO: Peephole: 0 instructions shortened. | |
BOLT-INFO: Peephole: 6 double jumps patched. | |
BOLT-INFO: Peephole: 38 tail call traps inserted. | |
BOLT-INFO: Peephole: 1 useless conditional branches removed. | |
BOLT-INFO: optimized 124 redirect call sites to eliminate 23 dynamic calls. | |
BOLT-INFO: basic block reordering modified layout of 337 (5.77%) functions | |
BOLT-INFO: Peephole: 0 instructions shortened. | |
BOLT-INFO: Peephole: 0 double jumps patched. | |
BOLT-INFO: Peephole: 0 tail call traps inserted. | |
BOLT-INFO: Peephole: 0 useless conditional branches removed. | |
BOLT-INFO: UCE removed 0 blocks and 0 bytes of code. | |
BOLT-INFO: running hfsort+ for 522 functions | |
BOLT-INFO: program-wide dynostats after all optimizations before SCTC and FOP: | |
8124979 : executed forward branches | |
2554814 : taken forward branches | |
2209363 : executed backward branches | |
1626342 : taken backward branches | |
551613 : executed unconditional branches | |
1099524 : all function calls | |
198786 : indirect calls | |
49281 : PLT calls | |
90043992 : executed instructions | |
24764541 : executed load instructions | |
19173048 : executed store instructions | |
232 : taken jump table branches | |
10885955 : total branches | |
4732769 : taken branches | |
6153186 : non-taken conditional branches | |
4181156 : taken conditional branches | |
10334342 : all conditional branches | |
10104284 : executed forward branches (+24.4%) | |
210408 : taken forward branches (-91.8%) | |
230058 : executed backward branches (-89.6%) | |
180792 : taken backward branches (-88.9%) | |
18565 : executed unconditional branches (-96.6%) | |
1099382 : all function calls (-0.0%) | |
198716 : indirect calls (-0.0%) | |
49211 : PLT calls (-0.1%) | |
89581363 : executed instructions (-0.5%) | |
24764471 : executed load instructions (-0.0%) | |
19173048 : executed store instructions (=) | |
232 : taken jump table branches (=) | |
10352907 : total branches (-4.9%) | |
409765 : taken branches (-91.3%) | |
9943142 : non-taken conditional branches (+61.6%) | |
391200 : taken conditional branches (-90.6%) | |
10334342 : all conditional branches (=) | |
BOLT-INFO: SCTC: patched 151 tail calls (137 forward) tail calls (14 backward) from a total of 151 while removing 3 double jumps and removing 113 basic blocks totalling 565 bytes of code. CTCs total execution count is 146094 and the number of times CTCs are taken is 77520. | |
BOLT-INFO: FOP optimized 0 redundant load(s) and 0 unused store(s) | |
BOLT-INFO: FOP changed 0 load(s) to use a register instead of a stack access, and 0 to use an immediate. | |
BOLT-INFO: FOP deleted 0 load(s) and 0 store(s). | |
BOLT-INFO: FRAME ANALYSIS: 339 function(s) (3.3% dyn cov) were not optimized. | |
BOLT-INFO: FRAME ANALYSIS: 2443 function(s) (23.3% dyn cov) could not have its frame indices restored. | |
BOLT-INFO: Shrink wrapping moved 4 spills inserting load/stores and 2 spills inserting push/pops | |
BOLT-INFO: Allocation combiner: 9 empty spaces coalesced. | |
BOLT-INFO: cache metrics after emitting functions: | |
There are 5845 functions; 522 (8.93%) are in the hot section, 547 (9.36%) have profile | |
There are 121954 basic blocks; 11012 (9.03%) are in the hot section | |
Hot code takes -nan% of binary (0 bytes out of 0, 0.00 huge pages) | |
Expected i-TLB cache hit ratio: 100.00% | |
TSP score: 71233725 | |
ExtTSP score: 71233725 | |
BOLT-INFO: setting _end to 0xcd7860 | |
BOLT-INFO: setting _end to 0xcd7860 | |
BOLT-INFO: patched build-id (flipped last bit) | |
lourens@CarbonX1:~/src/optcarrot$ benchmark-driver -e "bolt::~/src/ruby/ruby/ruby.bolt -I~/src/ruby/ruby/lib -I~/src/ruby/ruby/. -I~/src/ruby/ruby/.ext/x86_64-linux --disable-gems" -e "trunk::~/src/ruby/ruby/ruby -I~/src/ruby/ruby/lib -I~/src/ruby/ruby/. -I~/src/ruby/ruby/.ext/x86_64-linux --disable-gems" -v --repeat-count 24 benchmark.yml | |
bolt: ruby 2.6.0dev (2018-12-07 trunk 66259) [x86_64-linux] | |
trunk: ruby 2.6.0dev (2018-12-07 trunk 66259) [x86_64-linux] | |
Calculating ------------------------------------- | |
bolt trunk | |
optcarrot 47.350 45.496 fps | |
Comparison: | |
optcarrot | |
bolt: 47.4 fps | |
trunk: 45.5 fps - 1.04x slower | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment