Created
November 15, 2014 20:41
-
-
Save twillouer/ac13eb1dadc8a270f821 to your computer and use it in GitHub Desktop.
Benchmarking of toArray
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@State(Scope.Benchmark) | |
public class ToArrayBench { | |
ArrayList<Byte> list; | |
@Setup | |
public void setup() throws Throwable | |
{ | |
list = new ArrayList<>(); | |
for (int i = 0; i < 10; i++) { | |
list.add((byte) i); | |
} | |
} | |
@Benchmark | |
public void zero_sized_array() | |
{ | |
list.toArray(new Byte[0]); | |
} | |
@Benchmark | |
public void simple_toArray() | |
{ | |
list.toArray(); | |
} | |
@Benchmark | |
public void sized_array_from_list() | |
{ | |
list.toArray(new Byte[list.size()]); | |
} | |
@Benchmark | |
public void sized_array_fixed_size() | |
{ | |
list.toArray(new Byte[100]); | |
} | |
@Benchmark | |
public void defensive_copy() | |
{ | |
new ArrayList<>(list); | |
} | |
public static void main(String[] args) throws RunnerException, IOException | |
{ | |
Options opt = new OptionsBuilder().include(".*" + ToArrayBench.class.getSimpleName() + ".*") | |
.warmupIterations(20) | |
.warmupTime(TimeValue.seconds(1)) | |
.measurementIterations(20) | |
.timeUnit(TimeUnit.MILLISECONDS) | |
.forks(1) | |
// .addProfiler(LinuxPerfProfiler.class) | |
.build(); | |
new Runner(opt).run(); | |
} | |
} |
this is giving me similarly confusing results:
@State(Scope.Thread)
public class ToArrayBench {
@Param("10")
private int size;
private Byte[] buffer;
@Setup
public void setup() throws Throwable {
buffer = new Byte[size];
for (byte i = 0; i < size; i++) {
buffer[i] = i;
}
}
@Benchmark
public void fast(Blackhole bh) {
int s = buffer.length;
Byte[] copy = Arrays.copyOf(buffer, s, Byte[].class);
bh.consume(copy);
}
@Benchmark
public void slow(Blackhole bh) {
int s = buffer.length;
Byte[] copy = (Byte[]) Array.newInstance(Byte[].class.getComponentType(), s);
System.arraycopy(buffer, 0, copy, 0, s);
bh.consume(copy);
}
public static void main(String[] args) throws Throwable {
Options opt = new OptionsBuilder().include(".*" + ToArrayBench.class.getSimpleName() + ".*")
.warmupIterations(10)
.warmupTime(TimeValue.seconds(1))
.measurementIterations(20)
.timeUnit(TimeUnit.MILLISECONDS)
.threads(1)
.forks(1)
.addProfiler(LinuxPerfProfiler.class)
.build();
new Runner(opt).run();
}
}
Benchmark (size) Mode Samples Score Score error Units
o.o.j.s.ToArrayBench.fast 10 thrpt 20 134199.249 15573.195 ops/ms
o.o.j.s.ToArrayBench.fast:@cpi 10 thrpt 1 0.367 NaN CPI
o.o.j.s.ToArrayBench.slow 10 thrpt 20 49499.140 2661.012 ops/ms
o.o.j.s.ToArrayBench.slow:@cpi 10 thrpt 1 0.764 NaN CPI
perf stats for fast():
23044.197935 task-clock (msec) # 0.632 CPUs utilized
14,410 context-switches # 0.625 K/sec
3,227 cpu-migrations # 0.140 K/sec
423 page-faults # 0.018 K/sec
78,964,852,601 cycles # 3.427 GHz [30.93%]
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
214,945,315,883 instructions # 2.72 insns per cycle [38.77%]
40,570,072,182 branches # 1760.533 M/sec [39.00%]
8,633,173 branch-misses # 0.02% of all branches [38.99%]
55,630,805,896 L1-dcache-loads # 2414.092 M/sec [39.18%]
2,727,361,923 L1-dcache-load-misses # 4.90% of all L1-dcache hits [38.89%]
1,403,751,595 LLC-loads # 60.916 M/sec [30.76%]
<not supported> LLC-load-misses:HG
<not supported> L1-icache-loads:HG
10,486,830 L1-icache-load-misses:HG # 0.00% of all L1-icache hits [31.88%]
54,778,270,818 dTLB-loads:HG # 2377.096 M/sec [31.78%]
584,070 dTLB-load-misses:HG # 0.00% of all dTLB cache hits [31.70%]
23,538,830 iTLB-loads:HG # 1.021 M/sec [31.62%]
312,318 iTLB-load-misses:HG # 1.33% of all iTLB cache hits [31.64%]
<not supported> L1-dcache-prefetches:HG
0 L1-dcache-prefetch-misses:HG # 0.000 K/sec [31.57%]
36.459311589 seconds time elapsed
perf stats for slow():
23121.720993 task-clock (msec) # 0.635 CPUs utilized
14,615 context-switches # 0.632 K/sec
3,168 cpu-migrations # 0.137 K/sec
463 page-faults # 0.020 K/sec
79,644,122,769 cycles # 3.445 GHz [30.98%]
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
104,254,073,200 instructions # 1.31 insns per cycle [38.76%]
17,489,978,097 branches # 756.431 M/sec [38.87%]
42,822,998 branch-misses # 0.24% of all branches [38.74%]
21,942,957,608 L1-dcache-loads # 949.019 M/sec [38.73%]
1,022,663,441 L1-dcache-load-misses # 4.66% of all L1-dcache hits [38.71%]
88,585,236 LLC-loads # 3.831 M/sec [30.86%]
<not supported> LLC-load-misses:HG
<not supported> L1-icache-loads:HG
11,208,770 L1-icache-load-misses:HG # 0.00% of all L1-icache hits [31.33%]
21,891,428,217 dTLB-loads:HG # 946.791 M/sec [31.21%]
807,245 dTLB-load-misses:HG # 0.00% of all dTLB cache hits [31.04%]
24,236,835 iTLB-loads:HG # 1.048 M/sec [30.99%]
387,664 iTLB-load-misses:HG # 1.60% of all iTLB cache hits [31.06%]
<not supported> L1-dcache-prefetches:HG
0 L1-dcache-prefetch-misses:HG # 0.000 K/sec [31.13%]
36.428755980 seconds time elapsed
@jerrinot thanks for your time.
Still in trouble to understand the problem :)
No idea either, can someone please elaborate on that ?
Updated:
@State(Scope.Benchmark)
public class ToArrayBench {
private static final int SIZE = 100;
ArrayList<Byte> list;
@Setup
public void setup() throws Throwable
{
list = new ArrayList<>();
for (int i = 0; i < SIZE; i++) {
list.add((byte) i);
}
}
@Benchmark
public void zero_sized_array(Blackhole bh)
{
bh.consume(list.toArray(new Byte[0]));
}
@Benchmark
public void simple_toArray(Blackhole bh)
{
bh.consume(list.toArray());
}
@Benchmark
public void sized_array_from_list(Blackhole bh)
{
bh.consume(list.toArray(new Byte[list.size()]));
}
@Benchmark
public void sized_array_fixed_size(Blackhole bh)
{
bh.consume(list.toArray(new Byte[SIZE]));
}
@Benchmark
public void defensive_copy(Blackhole bh)
{
bh.consume(new ArrayList<>(list));
}
public static void main(String[] args) throws RunnerException, IOException
{
Options opt = new OptionsBuilder().include(".*" + ToArrayBench.class.getSimpleName() + ".*")
.warmupIterations(20)
.warmupTime(TimeValue.seconds(1))
.measurementIterations(20)
.timeUnit(TimeUnit.MILLISECONDS)
.forks(1)
// .addProfiler(LinuxPerfAsmProfiler.class)
.build();
new Runner(opt).run();
}
}
Benchmark | Mode | Cnt | Score | Error | Units |
---|---|---|---|---|---|
ToArrayBench.defensive_copy | thrpt | 200 | 16 714 192 | ± 129515,217 | ops/s |
ToArrayBench.simple_toArray | thrpt | 200 | 17 918 950 | ± 102801,298 | ops/s |
ToArrayBench.sized_array_fixed_size | thrpt | 200 | 5 799 136 | ± 65921,564 | ops/s |
ToArrayBench.sized_array_from_list | thrpt | 200 | 5 643 162 | ± 85215,009 | ops/s |
ToArrayBench.zero_sized_array | thrpt | 200 | 6 529 068 | ± 78960,062 | ops/s |
Routinely, I will chew on people who can't use perfasm
profiler, but this is not your fault it wasn't helping here. ;) Only in JMH 1.5+ (released yesterday) perfasm can decode the VM stubs, and VM stubs are the crucial piece of info to untangle this. See: http://cr.openjdk.java.net/~shade/scratch/ToArrayBench.java
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
that's interesting one. I've simplified your setup to include just the two cases I most interested in:
and indeed the results are counter-intuitive:
however when I change the sized_array_fixed_size() method to use constant there results are more aligned with expectation:
I can see the version the original version has lower instructions per cycle count:
This is the original sized_array_fixed_size():
vs. the zero_sized_array():
The new version of sized_array_fixed_size() with constant has IPC similar to the the zero_sized_array():
vs. the zero_sized_array():