将 float 转换为 uint64 和 uint32 行为异常

Question

当我在 C++ 中将 32 位浮点数转换为 64 位无符号整数时，一切都按预期工作。溢出导致 FE_OVERFLOW 标志被设置 (cfenv) 并且 return 值 0.

std::feclearexcept(FE_ALL_EXCEPT);
float a = ...;
uint64_t b = a;
std::fexcept_t flags;
std::fegetexceptflag(&flags, FE_ALL_EXCEPT);

但是当我像这样将 32 位浮点数转换为 32 位无符号整数时：

std::feclearexcept(FE_ALL_EXCEPT);
float a = ...;
uint32_t b = a;
std::fexcept_t flags;
std::fegetexceptflag(&flags, FE_ALL_EXCEPT);

除了高 32 位被截断外，我的行为与 64 位转换完全相同。它等于：

std::feclearexcept(FE_ALL_EXCEPT);
float a = ...;
uint64_t b2 = a;
uint32_t b = b2 & numeric_limits<uint32_t>::max();
std::fexcept_t flags;
std::fegetexceptflag(&flags, FE_ALL_EXCEPT);

所以只有当指数大于或等于 64 并且在指数 32 和 64 之间，它 return 是 64 位转换的低 32 位而不设置溢出。这很奇怪，因为您会期望它在指数 32 处溢出。

这是应该的方式，还是我做错了什么？

编译器是：LLVM 版本 6.0 (clang-600.0.45.3)（基于 LLVM 3.5svn）

Answer 1

从浮点数到整数的转换溢出是未定义的行为。您不能依赖它是通过单个汇编指令完成的，也不能指望它是通过溢出指令来完成您希望为其设置溢出标志的确切值集。

可能已经生成的汇编指令cvttsd2si确实在溢出时设置了标志，但是在转换为32位int类型时可能会生成该指令的64位变体。一个很好的理由是将浮点值截断为 unsigned 32 位整数，如您的问题，因为目标寄存器的所有 32 个低位都已正确设置 对于导致转换的浮点值定义执行64位有符号指令后。 cvttsd2si 指令没有无符号变体。

来自Intel manual：

CVTTSD2SI—Convert with Truncation Scalar Double-Precision FP Value to Signed Integer

…

If a converted result exceeds the range limits of signed doubleword integer (in non-64-bit modes or 64-bit mode with REX.W/VEX.W=0), the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.

If a converted result exceeds the range limits of signed quadword integer (in 64-bit mode and REX.W/VEX.W = 1), the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000_00000000H) is returned.

这个 blog post，尽管是针对 C 的，但对这个主题进行了扩展。

将 float 转换为 uint64 和 uint32 行为异常

Converting float to uint64 and uint32 behaves strangely

c++

floating-point

type-conversion