是否有详尽的分析器？

Is There An Exhaustive Profiler?

我经常看到人们 benchmark/profile（或建议其他人 benchmark/profile）在特定情况下在特定计算机上的特定 CPU 上的特定代码段；然后（可能错误地）假设此结果适用于不同情况下的代码（例如不同负载下同一核心中的其他逻辑 CPUs）在各种非常不同的 CPUs（例如 "all 64-bit 80x86") 在各种不同的计算机中（例如，具有不同的 RAM 时序等）。

我正在寻找的是一种分析器，它能够在许多条件下为许多 CPU 生成分析结果（主要是通过解释代码而不是直接测量）；然后使用加权因子（其中加权因子表示用户对每个测量案例的关心程度）组合所有结果，以创建一个实际有用且不会误导的结果。

是否有符合此描述的分析工具？

我认为互联网上没有发布通用的性能预测工具；但可能有一些内部 CPU 供应商来优化下一个微体系结构。

有 valgrind 个带有 callgrind/cachegrind (slow) simple model profilers. Callgrind counts basic block executions in model like 1 instruction is something like 1 cpu clock; cachegrind additionally instruments models memory accesses with some 2 level cache model and also may model simple branch predictor 的二进制检测平台。这两种工具都没有 knowledge/models 的 decode/execution/retire 现代 OOO CPU 的 "all 64-bit 80x86" 兼容 CPUs 的供应商 1 和供应商 2 的广泛 decode/execution/retire 功能（和 OOO cpus在基本的 OOO 功能和性能方面相似。

有几个 OOO CPU 模拟器的开源项目（从慢到非常慢），例如：MARSSx86（http://marss86.org/, 2012) based on PTLsim (http://www.facom.ufms.br/~ricardo/Courses/CompArchII/Tools/PTLSim/PTLsimManual.pdf，2007 年）或 Sniper 多核模拟器（在石墨框架）。（还有 DRAMSim/DRAMSim2 内存模拟器，用于精确的系统模拟，它被用于其他几个模拟器项目；它可以选择性地用于 RISC-V Rocket-Chip 模拟器）

您可能对某些（非常-非常慢 - 数十 KIPS）感兴趣 cycle-accurate simulator / microarchitecture simulator，但它们的开源变体并不多。有一些商业模拟器（例如在 ARM 世界中 - ARM Cycle Models / CPAKs；ARC nSIM，...）；或 simplescalar.com。还有内部专有模拟器（我们无法访问它们）。

microarchitecture/cycle 模拟器的唯一 public 近似值是供应商 1 的 IACA：https://software.intel.com/en-us/articles/intel-architecture-code-analyzer (inexact partial model of OOO port planning for short code sequences like inner loops without any memory hierarchy modeling). And there is other tool "SDE" from Vendor 1 to estimate/debug some future CPU instruction extensions with older CPUs and PIN binary rewriting tool: https://software.intel.com/en-us/articles/intel-software-development-emulator。

是否有详尽的分析器？

Is There An Exhaustive Profiler?

optimization

profiling