移位以将浮点数打包为 c 中的 13 位浮点数？

Question

例如，假设 18.xxx 作为浮点值作为函数的输入读入。它将被截断为 18.0。从那时起，将其编码为：0 10011 0010000，它满足所需的 13 位浮点数，并将作为十进制值为 2448 的 int 返回。有人知道如何使用移位来实现吗？

Answer 1

如果您的浮点数以具有无符号指数的 32 位 IEEE 754 单精度二进制格式表示，这可能会满足您的要求：

#include <stdio.h>
#include <string.h>
#include <assert.h>

unsigned short float32to13(float f) {

    assert(sizeof(float) == sizeof(unsigned int));

    unsigned int g;
    memcpy(&g, &f, sizeof(float)); // allow us to examine a float bit by bit

    unsigned int sign32 = (g >> 0x1f) & 0x1; // one bit sign
    unsigned int exponent32 = ((g >> 0x17) & 0xff) - 0x7f; // unbias 8 bits of exponent
    unsigned int fraction32 = g & 0x7fffff; // 23 bits of significand 

    assert(((exponent32 + 0xf) & ~ 0x1f) == 0); // don't overflow smaller exponent

    unsigned short sign13 = sign32;
    unsigned short exponent13 = exponent32 + 0xf; // rebias exponent by smaller amount
    unsigned short fraction13 = fraction32 >> 0x10; // drop lower 16 bits of significand precision

    return sign13 << 0xc | exponent13 << 0x7 | fraction13; // assemble a float13
}

int main() {

    float f = 18.0;

    printf("%u\n", float32to13(f));

    return 0;
}

输出

> ./a.out
2448
>

我将任何字节序问题和额外的错误检查留给最终用户。提供此示例仅是为了向 OP 演示在浮点格式之间转换所需的移位类型。与实际浮点格式的任何相似之处纯属巧合。

移位以将浮点数打包为 c 中的 13 位浮点数？

bit shifting to pack float to 13 bit float in c?

c

bit-manipulation

bit-shift