是否有公式可以计算浮点数中指数或尾数的位数？

Question

最近，我对在浮点数上使用位移来进行一些快速计算很感兴趣。

为了让它们以更通用的方式工作，我想让我的函数使用不同的浮点类型，可能是通过模板，不限于 float 和 double，但是还有“半角”或“四倍角”浮点数等等。

然后我注意到：

 - Half   ---  5 exponent bits  ---  10 signicant bits
 - Float  ---  8 exponent bits  ---  23 signicant bits
 - Double --- 11 exponent bits  ---  52 signicant bits

到目前为止我认为exponent bits = logbase2(total byte) * 3 + 2,
也就是说128位的浮点数应该有14个指数位，256位的浮点数应该有17个指数位。

然而，后来我了解到：

 - Quad   --- 15 exponent bits  ---  112 signicant bits
 - Octuple--- 19 exponent bits  ---  237 signicant bits

那么，有没有公式可以找到它？或者，有没有办法通过一些内置函数来调用它？
首选 C 或 C++，但对其他语言开放。

谢谢。

Answer 1

I want to see if there's a formula is to say if 512bit float is put in as standard, it would automatically work with it, without the need of altering anything

我不知道有哪个已发布的标准可以保证未来格式 (*) 的位分配。过去的历史表明，最终选择中有几个考虑因素，例如参见 [=21=].
(*) EDIT 的答案和链接: 见末尾添加的注释.

对于一个猜谜游戏，现有的5种二进制格式定义为IEEE-754 hint that the number of exponent bits grows slightly faster than linear. One (random) formula that fits these 5 data points could be for example (in WA符号）exponent_bits = round( (log2(total_bits) - 1)^(3/2) ).

这将预见假设的 binary512 格式将为指数分配 23 位，当然 IEEE 不受此类二次猜测的任何约束。

以上只是一个插值公式正好匹配已知的5个指数，肯定不是只有这样的公式。例如，在 oeis 上搜索序列 5,8,11,15,19 会找到 18 个列出的包含此作为子序列的整数序列。

[ EDIT ] 正如@EricPostpischil 的中指出的那样，IEEE 754-2008 实际上列出了公式 exponent_bits = round( 4 * log2(total_bits) - 13 ) for total_bits >= 128（该公式实际上也适用于 total_bits = 64，但不适用于 = 32 或 = 16）。

上面的经验公式与128 <= total_bits <= 1472的参考IEEE公式相匹配，特别是IEEE还给出了23指数位binary512和27 binary1024.

的指数位

Answer 2

答案是否定的

使用多少位（甚至使用哪种表示）由编译器实现者和委员会决定。并且没有办法猜测委员会的决定（不，对于任何合理的“最佳”定义，这都不是“最佳”解决方案......这只是那天在那个房间发生的事情：历史事故）。

如果你真的想达到那个水平，你需要在你想要部署的平台上实际测试你的代码，并添加一些 #ifdef 宏观（或询问用户）以找到哪种类型您的代码在运行上。

另请注意，根据我的经验，编译器在类型别名方面非常激进（以至于令人讨厌）的一个领域是浮点数。

Answer 3

通过内置函数提供的特征

C++ 通过 std::numeric_limits 模板提供此信息：

#include <iostream>
#include <limits>
#include <cmath>


template<typename T> void ShowCharacteristics()
{
    int radix = std::numeric_limits<T>::radix;

    std::cout << "The floating-point radix is " << radix << ".\n";

    std::cout << "There are " << std::numeric_limits<T>::digits
        << " base-" << radix << " digits in the significand.\n";

    int min = std::numeric_limits<T>::min_exponent;
    int max = std::numeric_limits<T>::max_exponent;

    std::cout << "Exponents range from " << min << " to " << max << ".\n";
    std::cout << "So there must be " << std::ceil(std::log2(max-min+1))
        << " bits in the exponent field.\n";
}


int main()
{
    ShowCharacteristics<double>();
}

示例输出：

The floating-point radix is 2.
There are 53 base-2 digits in the significand.
Exponents range from -1021 to 1024.
So there must be 11 bits in the exponent field.

C 还通过 <float.h> 中定义的 DBL_MANT_DIG 等宏定义提供信息，但标准仅为类型 float（前缀 FLT）定义名称、double (DBL) 和 long double (LDBL)，因此无法预测支持其他浮点类型的 C 实现中的名称。

请注意，C 和 C++ 标准中指定的指数与 IEEE-754 中描述的常用指数不同：它针对缩放为 [½, 1) 而不是 [1, 2) 的有效数字进行了调整，所以它比通常的 IEEE-754 指数大一个。（上面的示例显示指数范围为 −1021 到 1024，但 IEEE-754 指数范围为 −1022 到 1023。）

公式

IEEE-754 确实提供了推荐字段宽度的公式，但它不要求 IEEE-754 实现符合这些，当然 C 和 C++ 标准不要求 C 和 C++ 实现符合 IEEE- 754. IEEE 754-2008 3.6规定了交换格式参数，二进制参数为：

对于16、32、64或128位的浮点格式，有效位数宽度（包括前导位）应为11、24、53或113位，指数字段宽度应为5 、8、11 或 15 位。
否则，对于k位的浮点格式，k应该是32的倍数，尾数宽度应该为k−round(4•log₂k)+13，指数域为round(4•log₂k)−13.

Answer 4

与上面提到的概念类似，这里有一个替代公式（只是重新排列一些项），它将计算指数的无符号整数范围（[32,256,2048,32768,524288] , 对应 [5,8,11,15,19]-powers-of-2) 而无需调用 round 函数 :

uint_range =  ( 64 **  ( 1 + (k=log2(bits)-4)/2) )
              *
              (  2 ** -(  (3-k)**(2<k)         ) )

(a) x ** y 表示 x-to-y-power
(b) 2 < k 是一个布尔条件，应该只是 return 0 或 1。

函数至少要从16位到256位准确。除此之外，这个公式产生的指数大小为

   –  512-bit : 23 
   – 1024-bit : 27 
   – 2048-bit : 31 
   – 4096-bit : 35

（超过 256 可能不准确。即使是 27 位宽的指数也允许指数为 +/- 6700 万，并且在计算 2 的次方后超过 4000 万十进制数字。）

从那里到 IEEE 754 指数只是 log2(uint_range)

的问题

是否有公式可以计算浮点数中指数或尾数的位数？

Is there a formula to find the numbers of bits for either exponent or significand in a floating point number?

c

c++

floating-point

bit-manipulation

通过内置函数提供的特征

公式