Metric | With Preload | Without Preload | Delta (Abs.) | Delta (%) | Interpretation |
---|---|---|---|---|---|
Time Elapsed (s) | ~337.14 | ~296.96 | ~+40.18 s | +13.53% | Overall slowdown with preload. |
task-clock (CPU s) |
~300.28 | ~259.31 | ~+40.97 s | +15.80% | More CPU time used, indicating CPU-bound work or stalls. |
cycles (Trillions) |
~1,630 | ~1,405 | ~+225 T | +16.01% | More CPU cycles spent, largely due to stalls. |
instructions (T) |
~1,420 | ~1,396 | ~+24 T | +1.72% | Slightly more instructions; amount of actual work is very similar. |
IPC (Insn per Cycle) | 0.87 | 0.99 | -0.12 | -12.12% | Lower CPU efficiency with preload; CPU stalled more often. |
cache-references (B) |
~19.57 | ~17.36 | ~+2.21 B | +12.73% | More accesses to the cache hierarchy. |
cache-misses (B) |
~5.21 | ~4.20 | ~+1.01 B | +24.05% | CRITICAL: Huge increase in (likely LLC) misses. Strong evidence of cache pollution by memset . |
L1-dcache-load-misses (B) |
~7.65 | ~7.26 | ~+0.39 B | +5.37% | Significant increase in L1 data cache misses. |
L1-icache-load-misses (M) |
~78.7 | ~95.5 | ~-16.8 M | -17.59% | Fewer L1 instruction cache misses with preload; not a dominant factor. |
page-faults (M) |
~7.34 | ~0.70 | ~+6.64 M | +945.8% | Preload faults all pages upfront. No-preload faults on demand. High % due to low base in no-preload. |
minor-faults (M) |
~7.34 | ~0.70 | ~+6.64 M | +945.8% | Same as page-faults; reflects memset activity on ~30GB. |
dTLB-load-misses (B) |
~1.23 | ~0.58 | ~+0.65 B | +112.07% | CRITICAL: More than doubled dTLB misses. memset on huge region thrashes TLB. |
iTLB-load-misses (M) |
~26.0 | ~25.0 | ~+1.0 M | +4.00% | Minor difference in instruction TLB misses. |
branch-misses (B) |
~20.58 | ~19.97 | ~+0.61 B | +3.05% | Slightly more branch misses with preload. |
Summary of Findings (with Percentages):
- Preloading with
memset
resulted in a ~13.5% increase in overall execution time. - This slowdown is strongly correlated with:
- A +16.0% increase in CPU cycles.
- A -12.1% decrease in Instructions Per Cycle (IPC), indicating reduced CPU efficiency.
- A +24.0% increase in
cache-misses
(likely LLC misses), a major bottleneck. - A staggering +112.1% increase in
dTLB-load-misses
, indicating severe contention for address translation caching.
- The dramatic +945.8% increase in
page-faults
for the preload case is expected, as it reflects the upfront faulting of all ~7.3 million pages bymemset
. While this is a large percentage, the impact of these faults comes from the subsequent cache/TLB pollution rather than just the faulting time itself. - The data clearly shows that the memory system performance (caches and TLB) is severely degraded in the preload scenario for this workload, leading to significant performance loss.