Zing 上的异步分析器开销
Async profiler overhead on Zing
我们的团队正在使用 HdrHistograms 监控应用程序的延迟。当我将异步分析器附加到它时,所有百分位数都会急剧增加。
OS:Red Hat Enterprise Linux 8.1 版(Ootpa)
JVM:11.0.8-zing_20.08.2.0-b2-product-linux-X86_64
如果我附加带有标志 -i 1000 -t 的探查器,会发生这种情况:
如果我附加带有标志 -i 100000 -t 的探查器,会发生这种情况:
降低采样频率明显降低了开销,但仍然很大。我有两个问题:
- 除了降低采样频率之外,还有其他方法可以减少分析开销吗?也许有一些神奇的 kernel/JVM 标志?
- 这种开销是否会严重扭曲配置文件本身?
谢谢
分析间隔以纳秒为单位。您可以明确指定单位,例如-i 10ms
。在您的情况下,-i 1000
表示 1000 纳秒,这不是一个合理的采样间隔:该过程将只进行连续采样而不是有用的工作 - 当然,结果配置文件不会反映真实情况。从默认间隔 (10ms) 开始,只有在绝对需要时才减少它。
我在this answer中解释了合理范围:
As to the profiling interval, 10 ns is roughly 20-50 cpu instructions.
It's literally impossible to take samples at such rate. The process
will do nothing but spending all time inside the profiler.
The default sampling interval in cpu mode is 10ms. This choice is good
enough for profiling in production: for an average application the
profiling overhead will be negligible, while the number of samples
will be enough to collect a meaningful profile.
1ms interval is usually fine for benchmarks and for profiling real
applications for a short period of time. Lower intervals are rarely
useful - maybe, only for capturing a profile of a short running piece
of code.
我们的团队正在使用 HdrHistograms 监控应用程序的延迟。当我将异步分析器附加到它时,所有百分位数都会急剧增加。
OS:Red Hat Enterprise Linux 8.1 版(Ootpa)
JVM:11.0.8-zing_20.08.2.0-b2-product-linux-X86_64
如果我附加带有标志 -i 1000 -t 的探查器,会发生这种情况:
如果我附加带有标志 -i 100000 -t 的探查器,会发生这种情况:
降低采样频率明显降低了开销,但仍然很大。我有两个问题:
- 除了降低采样频率之外,还有其他方法可以减少分析开销吗?也许有一些神奇的 kernel/JVM 标志?
- 这种开销是否会严重扭曲配置文件本身?
谢谢
分析间隔以纳秒为单位。您可以明确指定单位,例如-i 10ms
。在您的情况下,-i 1000
表示 1000 纳秒,这不是一个合理的采样间隔:该过程将只进行连续采样而不是有用的工作 - 当然,结果配置文件不会反映真实情况。从默认间隔 (10ms) 开始,只有在绝对需要时才减少它。
我在this answer中解释了合理范围:
As to the profiling interval, 10 ns is roughly 20-50 cpu instructions. It's literally impossible to take samples at such rate. The process will do nothing but spending all time inside the profiler.
The default sampling interval in cpu mode is 10ms. This choice is good enough for profiling in production: for an average application the profiling overhead will be negligible, while the number of samples will be enough to collect a meaningful profile.
1ms interval is usually fine for benchmarks and for profiling real applications for a short period of time. Lower intervals are rarely useful - maybe, only for capturing a profile of a short running piece of code.