std::chrono::clock、硬件时钟和循环计数

std::chrono::clock, hardware clock and cycle count

std::chrono 提供几个时钟来测量时间。同时，我想 cpu 评估时间的唯一方法是计算周期。

问题 1： cpu 或 gpu 除了计算周期之外还有其他方法来计算时间吗？

如果是这样，因为计算机计数周期的方式永远不会像原子钟那样精确，这意味着计算机的 "second" (period = std::ratio<1>) 实际上可以比实际秒更短或更大，导致计算机时钟和 GPS 之间时间测量的长运行存在差异。

问题 2：正确吗？

一些硬件有不同的频率（例如空闲模式和涡轮模式）。在那种情况下，这意味着周期数会在一秒钟内发生变化。

问题3：cpu和gpus测得的"cycle count"是否因硬件频率而异？如果是，那么std::chrono如何处理呢？如果不是，一个周期对应什么（比如"fundamental"时间是什么）？有没有办法在编译时访问转换？有没有办法在运行时访问转换？

计算周期，是的，但是什么?

的周期

在现代 x86 上，内核使用的时间源（在内部以及用于 clock_gettime 和其他系统调用）通常是一个固定频率计数器，无论 turbo、power-保存，或时钟停止空闲。（这是您从 rdtsc 或 __rdtsc() in C/C++ 获得的计数器）。

正常的 std::chrono 实现将使用 OS 提供的函数，如 Unix 上的 clock_gettime。（在 Linux 上，这可以运行纯粹在用户 space 中，代码 + VDSO 页中的比例因子数据由内核映射到每个进程的地址 space。低-开销时间源很好。避免用户->内核->用户往返对启用 Meltdown + Spectre 缓解有很大帮助。）

分析不受内存限制的紧密循环可能需要使用实际的核心时钟周期，因此它对当前核心的实际速度不敏感。（并且不必担心将 CPU 提高到最大涡轮等）例如使用 perf stat ./a.out 或 perf record ./a.out。例如

一些系统没有/没有内置在 CPU 中的挂钟等效计数器，因此 OS 会在 RAM 中维护一个它更新的时间在定时器中断上，或者时间查询函数将从单独的芯片读取时间。

（系统调用 + 硬件 I/O = 更高的开销，这是 x86 的 rdtsc 指令从性能分析变成时钟源的部分原因。）

所有这些时钟频率最终都来自主板上的 crystal 振荡器。但是可以调整从周期计数推断时间的比例因子，以使时钟与原子时间保持同步，通常使用网络时间协议 (NTP)，正如@Tony 指出的那样。

Question 1: Does a cpu or a gpu has any other way to evaluate time than by counting cycles?

不同的硬件可能提供不同的便利。例如，x86 PC 使用了多种硬件设施来计时：在过去十年左右的时间里，x86 CPU 有 Time Stamp Counters operating at their processing frequency or - more recently - some fixed frequency (a "constant rate" aka "invariant" TSC); there may be a High Precision Event Timer, and going back further there were Programmable Interrupt Timers (https://en.wikipedia.org/wiki/Programmable_interval_timer).

If that is the case, because the way a computer count cycles will never be as precise as an atomic clock, it means that a "second" (period = std::ratio<1>) for a computer can be actually shorter or bigger than an actual second, causing differences in the long run for time measurements between the computer clock and let's say GPS.

是的，没有原子钟的计算机（它们现在可用 on a chip) isn't going to be as accurate as an atomic clock. That said, services such as Network Time Protocol allow you to maintain tighter coherence across a bunch of computers. It is sometimes aided by use of Pulse Per Second (PPS) techniques. More modern and accurate variants include Precision Time Protocol (PTP)（通常可以在 LAN 上达到亚微秒的精度）。

Question 3: Is the "cycle count" measured by cpu and gpus varying depending on the hardware frequency?

这取决于。对于 TSC，较新的“恒定速率”TSC 实现没有变化，其他的确实有所不同。

If yes, then how std::chrono deal with it?

我希望大多数实现调用 OS 提供的时间服务，因为 OS 往往最了解和访问硬件。有很多因素需要考虑——例如TSC 读数是否跨内核同步，如果 PC 进入某种睡眠模式会发生什么，TSC 采样周围需要什么样的内存栅栏....

If not, what does a cycle correspond to (like what is the "fundamental" time)?

对于 Intel CPU，请参阅 this answer。

Is there a way to access the conversion at compile-time? Is there a way to access the conversion at runtime?

std::chrono::duration::count exposes raw tick counts for whatever time source was used, and you can duraction_cast to other units of time (e.g. seconds). C++20 is expected to introduce further facilities like clock_cast。 AFAIK，没有可用的 constexpr 转换：如果程序可能最终运行在 TSC 速率不同于编译它的机器的机器上，这似乎也很可疑。

std::chrono::clock、硬件时钟和循环计数

std::chrono::clock, hardware clock and cycle count

c++

cpu

time

benchmarking

chrono