在 Java 11 中使用堆栈跟踪明显比在 Java 8 中慢
Consuming stack traces noticeably slower in Java 11 than Java 8
我在使用 jmh 1.21 比较 JDK 8 和 11 的性能时,我 运行 得到了一些令人惊讶的数字:
Java version: 1.8.0_192, vendor: Oracle Corporation
Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 21525.584 ± 58.957 ns/op
Java version: 9.0.4, vendor: Oracle Corporation
Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 28243.899 ± 498.173 ns/op
Java version: 10.0.2, vendor: Oracle Corporation
Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 28499.736 ± 215.837 ns/op
Java version: 11.0.1, vendor: Oracle Corporation
Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 48535.766 ± 2175.753 ns/op
OpenJDK 11 和 12 的性能与 OracleJDK11 类似。为了简洁起见,我省略了它们的编号。
我了解微基准测试并不表示实际应用程序的性能行为。不过,我很好奇这种差异是从哪里来的。 有什么想法吗?
这里是完整的基准:
pom.xml:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>jmh</groupId>
<artifactId>consume-stacktrace</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>JMH benchmark sample: Java</name>
<dependencies>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>${jmh.version}</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>${jmh.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<jmh.version>1.21</jmh.version>
<javac.target>1.8</javac.target>
<uberjar.name>benchmarks</uberjar.name>
</properties>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-enforcer-plugin</artifactId>
<version>1.4.1</version>
<executions>
<execution>
<id>enforce-versions</id>
<goals>
<goal>enforce</goal>
</goals>
<configuration>
<rules>
<requireMavenVersion>
<version>3.0</version>
</requireMavenVersion>
</rules>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
<configuration>
<compilerVersion>${javac.target}</compilerVersion>
<source>${javac.target}</source>
<target>${javac.target}</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<finalName>${uberjar.name}</finalName>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.openjdk.jmh.Main</mainClass>
</transformer>
</transformers>
<filters>
<filter>
<!--
Shading signed JARs will fail without this.
-->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>2.6.1</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.2</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<plugin>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.0.0</version>
</plugin>
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<plugin>
<artifactId>maven-site-plugin</artifactId>
<version>3.7.1</version>
</plugin>
<plugin>
<artifactId>maven-source-plugin</artifactId>
<version>3.0.1</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.0</version>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
src/main/java/jmh/MyBenchmark.java:
package jmh;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.infra.Blackhole;
import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.concurrent.TimeUnit;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class MyBenchmark
{
@Benchmark
public void throwAndConsumeStacktrace(Blackhole bh)
{
try
{
throw new IllegalArgumentException("I love benchmarks");
}
catch (IllegalArgumentException e)
{
StringWriter sw = new StringWriter();
e.printStackTrace(new PrintWriter(sw));
bh.consume(sw.toString());
}
}
}
这是我使用的 Windows 特定脚本。 t运行将它移植到其他平台应该是微不足道的:
set JAVA_HOME=C:\Program Files\Java\jdk1.8.0_192
call mvn -V -Djavac.target=1.8 clean install
"%JAVA_HOME%\bin\java" -jar target\benchmarks.jar
set JAVA_HOME=C:\Program Files\Java\jdk-9.0.4
call mvn -V -Djavac.target=9 clean install
"%JAVA_HOME%\bin\java" -jar target\benchmarks.jar
set JAVA_HOME=C:\Program Files\Java\jdk-10.0.2
call mvn -V -Djavac.target=10 clean install
"%JAVA_HOME%\bin\java" -jar target\benchmarks.jar
set JAVA_HOME=C:\Program Files\Java\oracle-11.0.1
call mvn -V -Djavac.target=11 clean install
"%JAVA_HOME%\bin\java" -jar target\benchmarks.jar
我的运行环境是:
Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 2018-10-24T14:41:47-04:00)
Maven home: C:\Program Files\apache-maven-3.6.0\bin\..
Default locale: en_CA, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"
更具体地说,我是 运行 Microsoft Windows [Version 10.0.17763.195]
。
我怀疑这是由于一些变化造成的。
8->9 回归发生在切换到 StackWalker 以生成堆栈跟踪时(JDK-8150778). Unfortunately, this made VM native code intern a lot of strings, and StringTable becomes the bottleneck. If you profile OP's benchmark, you will see the profile like in JDK-8151751。它应该足以 perf record -g
运行基准测试的整个 JVM,然后查看 perf report
.(提示,提示,下次可以自己动手!)
而 10->11 回归一定是后来发生的。我 怀疑 这是由于 StringTable 准备切换到完全并发的哈希 table (JDK-8195100,正如 Claes 指出的那样,不完全在 11 中)或其他(class 数据共享更改?)。
无论哪种方式,在快速路径上实习都是一个坏主意,JDK-8151751 的补丁应该已经处理了这两种回归。
看这个:
8u191: 15108 ± 99 ns/op [目前一切顺利]
- 54.55% 0.37% java libjvm.so [.] JVM_GetStackTraceElement
- 54.18% JVM_GetStackTraceElement
- 52.22% java_lang_Throwable::get_stack_trace_element
- 48.23% java_lang_StackTraceElement::create
- 17.82% StringTable::intern
- 13.92% StringTable::intern
- 4.83% Klass::external_name
+ 3.41% Method::line_number_from_bci
"head": 22382 ± 134 ns/op [回归]
- 69.79% 0.05% org.sample.MyBe libjvm.so [.] JVM_InitStackTraceElement
- 69.73% JVM_InitStackTraceElementArray
- 69.14% java_lang_Throwable::get_stack_trace_elements
- 66.86% java_lang_StackTraceElement::fill_in
- 38.48% StringTable::intern
- 21.81% StringTable::intern
- 2.21% Klass::external_name
1.82% Method::line_number_from_bci
0.97% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<573
"head" + JDK-8151751 补丁:7511 ± 26 ns/op [哇,比8u还好]
- 22.53% 0.12% org.sample.MyBe libjvm.so [.] JVM_InitStackTraceElement
- 22.40% JVM_InitStackTraceElementArray
- 20.25% java_lang_Throwable::get_stack_trace_elements
- 12.69% java_lang_StackTraceElement::fill_in
+ 6.86% Method::line_number_from_bci
2.08% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier
2.24% InstanceKlass::method_with_orig_idnum
1.03% Handle::Handle
我用 async-profiler 调查了这个问题,它可以绘制很酷的火焰图来展示 CPU 时间花在哪里。
正如@AlekseyShipilev 指出的那样,JDK 8 和 JDK 9 之间的减速主要是 StackWalker 变化的结果。此外,G1 自 JDK 9 以来已成为默认 GC。如果我们显式设置 -XX:+UseParallelGC
(JDK 8 中的默认值),分数会稍微好一些。
但最有趣的部分是 JDK 11.
中的减速
这是 async-profiler 显示的内容(可点击的 SVG)。
两个配置文件的主要区别在于 java_lang_Throwable::get_stack_trace_elements
块的大小,由 StringTable::intern
主导。显然 StringTable::intern
在 JDK 上花费的时间更长 11.
让我们放大:
请注意,JDK 11 中的 StringTable::intern
调用 do_intern
,后者又分配一个新的 java.lang.String
对象。看起来很可疑。在 JDK 10 个人资料中看不到此类内容。是时候查看源代码了。
oop StringTable::intern(Handle string_or_null_h, jchar* name, int len, TRAPS) {
// shared table always uses java_lang_String::hash_code
unsigned int hash = java_lang_String::hash_code(name, len);
oop found_string = StringTable::the_table()->lookup_shared(name, len, hash);
if (found_string != NULL) {
return found_string;
}
if (StringTable::_alt_hash) {
hash = hash_string(name, len, true);
}
return StringTable::the_table()->do_intern(string_or_null_h, name, len,
| hash, CHECK_NULL);
} |
----------------
|
v
oop StringTable::do_intern(Handle string_or_null_h, const jchar* name,
int len, uintx hash, TRAPS) {
HandleMark hm(THREAD); // cleanup strings created
Handle string_h;
if (!string_or_null_h.is_null()) {
string_h = string_or_null_h;
} else {
string_h = java_lang_String::create_from_unicode(name, len, CHECK_NULL);
}
JDK11中的函数首先在共享StringTable中查找字符串,没有找到,然后转到do_intern
并立即创建一个新的String对象。
在 JDK 10 sources 调用 lookup_shared
后,主 table 中有一个额外的查找,它返回现有的字符串而不创建新对象:
found_string = the_table()->lookup_in_main_table(index, name, len, hashValue);
此重构是 JDK-8195097 "Make it possible to process StringTable outside safepoint" 的结果。
TL;DR While interning method names in JDK 11, HotSpot creates redundant String objects. This has happened after JDK-8195097.
我在使用 jmh 1.21 比较 JDK 8 和 11 的性能时,我 运行 得到了一些令人惊讶的数字:
Java version: 1.8.0_192, vendor: Oracle Corporation
Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 21525.584 ± 58.957 ns/op
Java version: 9.0.4, vendor: Oracle Corporation
Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 28243.899 ± 498.173 ns/op
Java version: 10.0.2, vendor: Oracle Corporation
Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 28499.736 ± 215.837 ns/op
Java version: 11.0.1, vendor: Oracle Corporation
Benchmark Mode Cnt Score Error Units
MyBenchmark.throwAndConsumeStacktrace avgt 25 48535.766 ± 2175.753 ns/op
OpenJDK 11 和 12 的性能与 OracleJDK11 类似。为了简洁起见,我省略了它们的编号。
我了解微基准测试并不表示实际应用程序的性能行为。不过,我很好奇这种差异是从哪里来的。 有什么想法吗?
这里是完整的基准:
pom.xml:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>jmh</groupId>
<artifactId>consume-stacktrace</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>JMH benchmark sample: Java</name>
<dependencies>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>${jmh.version}</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>${jmh.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<jmh.version>1.21</jmh.version>
<javac.target>1.8</javac.target>
<uberjar.name>benchmarks</uberjar.name>
</properties>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-enforcer-plugin</artifactId>
<version>1.4.1</version>
<executions>
<execution>
<id>enforce-versions</id>
<goals>
<goal>enforce</goal>
</goals>
<configuration>
<rules>
<requireMavenVersion>
<version>3.0</version>
</requireMavenVersion>
</rules>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
<configuration>
<compilerVersion>${javac.target}</compilerVersion>
<source>${javac.target}</source>
<target>${javac.target}</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<finalName>${uberjar.name}</finalName>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>org.openjdk.jmh.Main</mainClass>
</transformer>
</transformers>
<filters>
<filter>
<!--
Shading signed JARs will fail without this.
-->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>2.6.1</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.2</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<plugin>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.0.0</version>
</plugin>
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<plugin>
<artifactId>maven-site-plugin</artifactId>
<version>3.7.1</version>
</plugin>
<plugin>
<artifactId>maven-source-plugin</artifactId>
<version>3.0.1</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.0</version>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
src/main/java/jmh/MyBenchmark.java:
package jmh;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.infra.Blackhole;
import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.concurrent.TimeUnit;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class MyBenchmark
{
@Benchmark
public void throwAndConsumeStacktrace(Blackhole bh)
{
try
{
throw new IllegalArgumentException("I love benchmarks");
}
catch (IllegalArgumentException e)
{
StringWriter sw = new StringWriter();
e.printStackTrace(new PrintWriter(sw));
bh.consume(sw.toString());
}
}
}
这是我使用的 Windows 特定脚本。 t运行将它移植到其他平台应该是微不足道的:
set JAVA_HOME=C:\Program Files\Java\jdk1.8.0_192
call mvn -V -Djavac.target=1.8 clean install
"%JAVA_HOME%\bin\java" -jar target\benchmarks.jar
set JAVA_HOME=C:\Program Files\Java\jdk-9.0.4
call mvn -V -Djavac.target=9 clean install
"%JAVA_HOME%\bin\java" -jar target\benchmarks.jar
set JAVA_HOME=C:\Program Files\Java\jdk-10.0.2
call mvn -V -Djavac.target=10 clean install
"%JAVA_HOME%\bin\java" -jar target\benchmarks.jar
set JAVA_HOME=C:\Program Files\Java\oracle-11.0.1
call mvn -V -Djavac.target=11 clean install
"%JAVA_HOME%\bin\java" -jar target\benchmarks.jar
我的运行环境是:
Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; 2018-10-24T14:41:47-04:00)
Maven home: C:\Program Files\apache-maven-3.6.0\bin\..
Default locale: en_CA, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "windows"
更具体地说,我是 运行 Microsoft Windows [Version 10.0.17763.195]
。
我怀疑这是由于一些变化造成的。
8->9 回归发生在切换到 StackWalker 以生成堆栈跟踪时(JDK-8150778). Unfortunately, this made VM native code intern a lot of strings, and StringTable becomes the bottleneck. If you profile OP's benchmark, you will see the profile like in JDK-8151751。它应该足以 perf record -g
运行基准测试的整个 JVM,然后查看 perf report
.(提示,提示,下次可以自己动手!)
而 10->11 回归一定是后来发生的。我 怀疑 这是由于 StringTable 准备切换到完全并发的哈希 table (JDK-8195100,正如 Claes 指出的那样,不完全在 11 中)或其他(class 数据共享更改?)。
无论哪种方式,在快速路径上实习都是一个坏主意,JDK-8151751 的补丁应该已经处理了这两种回归。
看这个:
8u191: 15108 ± 99 ns/op [目前一切顺利]
- 54.55% 0.37% java libjvm.so [.] JVM_GetStackTraceElement
- 54.18% JVM_GetStackTraceElement
- 52.22% java_lang_Throwable::get_stack_trace_element
- 48.23% java_lang_StackTraceElement::create
- 17.82% StringTable::intern
- 13.92% StringTable::intern
- 4.83% Klass::external_name
+ 3.41% Method::line_number_from_bci
"head": 22382 ± 134 ns/op [回归]
- 69.79% 0.05% org.sample.MyBe libjvm.so [.] JVM_InitStackTraceElement
- 69.73% JVM_InitStackTraceElementArray
- 69.14% java_lang_Throwable::get_stack_trace_elements
- 66.86% java_lang_StackTraceElement::fill_in
- 38.48% StringTable::intern
- 21.81% StringTable::intern
- 2.21% Klass::external_name
1.82% Method::line_number_from_bci
0.97% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<573
"head" + JDK-8151751 补丁:7511 ± 26 ns/op [哇,比8u还好]
- 22.53% 0.12% org.sample.MyBe libjvm.so [.] JVM_InitStackTraceElement
- 22.40% JVM_InitStackTraceElementArray
- 20.25% java_lang_Throwable::get_stack_trace_elements
- 12.69% java_lang_StackTraceElement::fill_in
+ 6.86% Method::line_number_from_bci
2.08% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier
2.24% InstanceKlass::method_with_orig_idnum
1.03% Handle::Handle
我用 async-profiler 调查了这个问题,它可以绘制很酷的火焰图来展示 CPU 时间花在哪里。
正如@AlekseyShipilev 指出的那样,JDK 8 和 JDK 9 之间的减速主要是 StackWalker 变化的结果。此外,G1 自 JDK 9 以来已成为默认 GC。如果我们显式设置 -XX:+UseParallelGC
(JDK 8 中的默认值),分数会稍微好一些。
但最有趣的部分是 JDK 11.
中的减速
这是 async-profiler 显示的内容(可点击的 SVG)。
两个配置文件的主要区别在于 java_lang_Throwable::get_stack_trace_elements
块的大小,由 StringTable::intern
主导。显然 StringTable::intern
在 JDK 上花费的时间更长 11.
让我们放大:
请注意,JDK 11 中的 StringTable::intern
调用 do_intern
,后者又分配一个新的 java.lang.String
对象。看起来很可疑。在 JDK 10 个人资料中看不到此类内容。是时候查看源代码了。
oop StringTable::intern(Handle string_or_null_h, jchar* name, int len, TRAPS) {
// shared table always uses java_lang_String::hash_code
unsigned int hash = java_lang_String::hash_code(name, len);
oop found_string = StringTable::the_table()->lookup_shared(name, len, hash);
if (found_string != NULL) {
return found_string;
}
if (StringTable::_alt_hash) {
hash = hash_string(name, len, true);
}
return StringTable::the_table()->do_intern(string_or_null_h, name, len,
| hash, CHECK_NULL);
} |
----------------
|
v
oop StringTable::do_intern(Handle string_or_null_h, const jchar* name,
int len, uintx hash, TRAPS) {
HandleMark hm(THREAD); // cleanup strings created
Handle string_h;
if (!string_or_null_h.is_null()) {
string_h = string_or_null_h;
} else {
string_h = java_lang_String::create_from_unicode(name, len, CHECK_NULL);
}
JDK11中的函数首先在共享StringTable中查找字符串,没有找到,然后转到do_intern
并立即创建一个新的String对象。
在 JDK 10 sources 调用 lookup_shared
后,主 table 中有一个额外的查找,它返回现有的字符串而不创建新对象:
found_string = the_table()->lookup_in_main_table(index, name, len, hashValue);
此重构是 JDK-8195097 "Make it possible to process StringTable outside safepoint" 的结果。
TL;DR While interning method names in JDK 11, HotSpot creates redundant String objects. This has happened after JDK-8195097.