与 C++ 中的字节、块、en-/decoding 混淆

Question

我有一个 64 字节的块，想在末尾附加一个 64 位（8 字节）的数据块。

typedef unsigned char uint1; // 1 Byte
typedef unsigned int uint4; // 4 Byte

// The 64 Byte-Block:
int BLOCKSIZE=64;
static uint1 padding[BLOCKSIZE] = {
        0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
};
// [[10000000][00000000].........[00000000]]


// The 64 Bit (8 Byte-Block):
uint4 appendix[2] = {};
appendix[1] = 0x000000ff;
// [[00000000000000000000000000000000][00000000000000000000000011111111]]

memcpy 从附录到填充的最后 8 字节的 8 个字节之后

memcpy(&padding[56], &appendix, 8);

看起来像

static uint1 padding[BLOCKSIZE] = {
        0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xff, 0, 0, 0, 0
    };

但它不应该是这样的吗？

static uint1 padding[BLOCKSIZE] = {
        0x80, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0xff
    };

我不知道这里有什么问题！？！？

你能帮帮我吗？

Answer 1

你应该看看 Endianless。你的选择是大端。

Answer 2

appendix[1] = 0x000000ff;
// [[00000000000000000000000000000000][00000000000000000000000011111111]]

您正在假设字节顺序（字节序）。你不能做出这样的假设。根据架构的字节顺序，appendix 也可以这样表示：

// [[00000000000000000000000000000000][11111111000000000000000000000000]]

如果你想专门设置最后一个字节，那么你需要对字节进行操作，而不是多字节整数。例如：

uint1 appendix[8] = {};
appendix[7] = 0xff;

如果您确实需要最后 8 个字节来表示两个 4 字节整数，那么您的代码在这方面是正确的，只是您对内存应该是什么样子的假设是错误的。

如果整数必须按照特定的字节顺序才能通过网络发送，那么您必须对其进行适当的转换。 POSIX 提供 htonl 并且它的姊妹函数正是用于此目的。 msvc也提供了这些功能。

您还假设 unsigned int 是 4 个字节。不能保证是。如果您需要 4 字节整数，请改用 int32_t。

更新：

My Goal is to implement MD5 and I need to append a 64 bit representation of the length of a file.

根据rfc1321：

... a sequence of bytes can be interpreted as a sequence of 32-bit words, where each consecutive group of four bytes is interpreted as a word with the low-order (least significant) byte given first.

MD5 是小端。因此，在不转换字节顺序的情况下写入 2*4 数组只能在小端处理器上正常工作。

我建议使用 8*1 字节数组，以便您可以完全按照规范要求控制字节顺序。或者，如果您在 linux 或提供它们的其他平台上，您可以使用 htole32 和 le32toh 函数转换为正确的字节顺序。在另一个平台上，您可能需要自己实现它们。

Answer 3

因此，就我能够理解 RFC1321 而言，我需要原始消息（文件）大小的 64 位整数表示。文件大小为 64 字节。在 64 位整数中，值 64 是二进制形式：

0000000000000000000000000000000000000000000000000000000001000000

或：

0000001000000000000000000000000000000000000000000000000000000000

我有两者的解码功能，但我不知道哪个适合 md5？

与 C++ 中的字节、块、en-/decoding 混淆

confused with bytes, blocks, en-/decoding in c++

c++

byte