为什么使用 for 循环的倒数总和比流快 400 倍?
Why is the sum of reciprocals using a for-loop ~400x faster than streams?
此代码正在对 3 种不同的方法进行基准测试,以计算 double[]
的元素的倒数之和。
- 一个
for
循环
- Java 8 个流
colt
数学库
使用简单 for 循环的计算比使用流的计算快约 400 倍的原因是什么? (或者基准测试代码有什么需要改进的吗?或者使用流计算这个的更快的方法?)
代码:
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
import cern.colt.list.DoubleArrayList;
import cern.jet.stat.Descriptive;
import org.openjdk.jmh.annotations.*;
@State(Scope.Thread)
public class MyBenchmark {
public static double[] array;
static {
int num_of_elements = 100;
array = new double[num_of_elements];
for (int i = 0; i < num_of_elements; i++) {
array[i] = i+1;
}
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void testInversionSumForLoop(){
double result = 0;
for (int i = 0; i < array.length; i++) {
result += 1.0/array[i];
}
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void testInversionSumUsingStreams(){
double result = 0;
result = Arrays.stream(array).map(d -> 1/d).sum();
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void testInversionSumUsingCernColt(){
double result = Descriptive.sumOfInversions(new DoubleArrayList(array), 0, array.length-1);
}
}
结果:
/**
* Results
* Benchmark Mode Cnt Score Error Units
* MyBenchmark.testInversionSumForLoop avgt 200 1.647 ± 0.155 ns/op
* MyBenchmark.testInversionSumUsingCernColt avgt 200 603.254 ± 22.199 ns/op
* MyBenchmark.testInversionSumUsingStreams avgt 200 645.895 ± 20.833 ns/o
*/
更新:这些结果表明 Blackhome.consume 或 return 是避免 jvm 优化所必需的。
/**
* Updated results after adding Blackhole.consume
* Benchmark Mode Cnt Score Error Units
* MyBenchmark.testInversionSumForLoop avgt 200 525.498 ± 10.458 ns/op
* MyBenchmark.testInversionSumUsingCernColt avgt 200 517.930 ± 2.080 ns/op
* MyBenchmark.testInversionSumUsingStreams avgt 200 582.103 ± 3.261 ns/op
*/
oracle jdk版本“1.8.0_181”,达尔文内核版本 17.7.0
在您的示例中,JVM 很可能会完全优化循环,因为 result
计算后永远不会读取值。您应该使用 Blackhole
来消耗 result
,如下所示:
@State(Scope.Thread)
@Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class MyBenchmark {
static double[] array;
static {
int num_of_elements = 100;
array = new double[num_of_elements];
for (int i = 0; i < num_of_elements; i++) {
array[i] = i + 1;
}
}
double result = 0;
@Benchmark
public void baseline(Blackhole blackhole) {
result = 1;
result = result / 1.0;
blackhole.consume(result);
}
@Benchmark
public void testInversionSumForLoop(Blackhole blackhole) {
for (int i = 0; i < array.length; i++) {
result += 1.0 / array[i];
}
blackhole.consume(result);
}
@Benchmark
public void testInversionSumUsingStreams(Blackhole blackhole) {
result = Arrays.stream(array).map(d -> 1 / d).sum();
blackhole.consume(result);
}
}
这个新的基准测试显示了预期的 4 倍差异。循环受益于 JVM 中的 a number of optimizations,并且不像流那样涉及新对象的创建。
Benchmark Mode Cnt Score Error Units
MyBenchmark.baseline avgt 100 2.437 ± 0.139 ns/op
MyBenchmark.testInversionSumForLoop avgt 100 135.512 ± 13.080 ns/op
MyBenchmark.testInversionSumUsingStreams avgt 100 506.479 ± 4.209 ns/o
我试图添加一个基线来显示在我的机器上单次操作的成本是多少。基线 ns/ops
类似于您的循环 ns/ops
,IMO 确认您的循环已被优化。
我希望有人能告诉我什么是该基准场景的良好基线。
我的环境:
openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment 18.9 (build 11.0.1+13)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.1+13, mixed mode)
Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
此代码正在对 3 种不同的方法进行基准测试,以计算 double[]
的元素的倒数之和。
- 一个
for
循环 - Java 8 个流
colt
数学库
使用简单 for 循环的计算比使用流的计算快约 400 倍的原因是什么? (或者基准测试代码有什么需要改进的吗?或者使用流计算这个的更快的方法?)
代码:
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
import cern.colt.list.DoubleArrayList;
import cern.jet.stat.Descriptive;
import org.openjdk.jmh.annotations.*;
@State(Scope.Thread)
public class MyBenchmark {
public static double[] array;
static {
int num_of_elements = 100;
array = new double[num_of_elements];
for (int i = 0; i < num_of_elements; i++) {
array[i] = i+1;
}
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void testInversionSumForLoop(){
double result = 0;
for (int i = 0; i < array.length; i++) {
result += 1.0/array[i];
}
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void testInversionSumUsingStreams(){
double result = 0;
result = Arrays.stream(array).map(d -> 1/d).sum();
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public void testInversionSumUsingCernColt(){
double result = Descriptive.sumOfInversions(new DoubleArrayList(array), 0, array.length-1);
}
}
结果:
/**
* Results
* Benchmark Mode Cnt Score Error Units
* MyBenchmark.testInversionSumForLoop avgt 200 1.647 ± 0.155 ns/op
* MyBenchmark.testInversionSumUsingCernColt avgt 200 603.254 ± 22.199 ns/op
* MyBenchmark.testInversionSumUsingStreams avgt 200 645.895 ± 20.833 ns/o
*/
更新:这些结果表明 Blackhome.consume 或 return 是避免 jvm 优化所必需的。
/**
* Updated results after adding Blackhole.consume
* Benchmark Mode Cnt Score Error Units
* MyBenchmark.testInversionSumForLoop avgt 200 525.498 ± 10.458 ns/op
* MyBenchmark.testInversionSumUsingCernColt avgt 200 517.930 ± 2.080 ns/op
* MyBenchmark.testInversionSumUsingStreams avgt 200 582.103 ± 3.261 ns/op
*/
oracle jdk版本“1.8.0_181”,达尔文内核版本 17.7.0
在您的示例中,JVM 很可能会完全优化循环,因为 result
计算后永远不会读取值。您应该使用 Blackhole
来消耗 result
,如下所示:
@State(Scope.Thread)
@Warmup(iterations = 10, time = 200, timeUnit = MILLISECONDS)
@Measurement(iterations = 20, time = 500, timeUnit = MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class MyBenchmark {
static double[] array;
static {
int num_of_elements = 100;
array = new double[num_of_elements];
for (int i = 0; i < num_of_elements; i++) {
array[i] = i + 1;
}
}
double result = 0;
@Benchmark
public void baseline(Blackhole blackhole) {
result = 1;
result = result / 1.0;
blackhole.consume(result);
}
@Benchmark
public void testInversionSumForLoop(Blackhole blackhole) {
for (int i = 0; i < array.length; i++) {
result += 1.0 / array[i];
}
blackhole.consume(result);
}
@Benchmark
public void testInversionSumUsingStreams(Blackhole blackhole) {
result = Arrays.stream(array).map(d -> 1 / d).sum();
blackhole.consume(result);
}
}
这个新的基准测试显示了预期的 4 倍差异。循环受益于 JVM 中的 a number of optimizations,并且不像流那样涉及新对象的创建。
Benchmark Mode Cnt Score Error Units
MyBenchmark.baseline avgt 100 2.437 ± 0.139 ns/op
MyBenchmark.testInversionSumForLoop avgt 100 135.512 ± 13.080 ns/op
MyBenchmark.testInversionSumUsingStreams avgt 100 506.479 ± 4.209 ns/o
我试图添加一个基线来显示在我的机器上单次操作的成本是多少。基线 ns/ops
类似于您的循环 ns/ops
,IMO 确认您的循环已被优化。
我希望有人能告诉我什么是该基准场景的良好基线。
我的环境:
openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment 18.9 (build 11.0.1+13)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.1+13, mixed mode)
Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux