在 C++ 中制作可移植的字节序正确的文件读取/写入代码的干净方法
Clean way to make portable endian-correct file-reading / writing code in C++
我想编写一些 C++ 代码,以正确的字节序方式从文件中读取和写入。更准确地说,我希望能够读取特定类型的文件,我可以轻松检测到其字节序(它的幻数是否被反转)。
但是我该如何正确地干净地读取文件呢?我阅读了以下文章,其中提供了一个有用的想法:
我们的想法是制作一个 class,其中有一些函数指针指向所需的字节顺序正确的 read() 函数。但根据我的经验,函数指针很慢,尤其是当你必须像本例那样频繁地调用它们时。另一种选择是
if (file_was_detected_big_endian) { read_bigendian(); } else { read_littleendian(); }
对于我拥有的每个 read_x_bit_int() 函数,但这似乎也很低效。
我正在使用 Boost,所以我可以使用它的所有功能来帮助我。特别是,有 endian 子库:
http://www.boost.org/doc/libs/develop/libs/endian/doc/buffers.html
虽然我不确定如何才能干净利落地使用这段代码来做我想做的事。我希望有一些代码,我可以在其中直接读取 16 个字节到代表文件一部分的 struct
的指针中,同时自动更正字节序。我当然可以自己编写这段代码,但我觉得一定已经存在可靠的解决方案。
我想我所有的代码都会被手动填充并防止对齐问题。
谢谢!
这个问题有两种解决方法:
- 以字节序不可知的方式编写文件,并且
- 添加标记,并以字节序感知方式读取文件。
第一种方法需要在写作上做更多的工作,而第二种方法使写作变得"overhead-free"。
这两种方法都可以在没有函数指针的情况下实现:由于虚函数的存在,C++ 中对它们的需求已大大减少*.
这两种方法的实现是相似的:你需要创建一个抽象基础 class 来序列化原始数据类型,创建一个 class 的实例来读取正确的字节序,并调用它的虚拟成员读写函数:
struct PrimitiveSerializer {
virtual void serializeInt(ostream& out, const int val) = 0;
virtual void serializeChar(ostream& out, const char val) = 0;
virtual void serializeString(ostream& out, const std::string& val) = 0;
...
virtual int deserializeInt(istream& in) = 0;
virtual char deserializeChar(istream& in) = 0;
virtual std::string deserializeString(istream& in) = 0;
};
struct BigEndianSerializer : public PrimitiveSerializer {
...
};
struct LittleEndianSerializer : public PrimitiveSerializer {
...
};
根据方法的不同,使用哪个 subclass 的决定也不同。如果您使用第一种方法(即编写与字节序无关的文件),那么您将实例化与您系统的字节序相匹配的序列化程序。如果你采用第二种方法,你将从文件中读取幻数,然后选择与你的文件的字节顺序相匹配的 subclass。
此外,第一种方法可以使用 hton
/ ntoh
函数来实现。
* 函数指针本身并不是 "slow",尽管它们更容易编写低效代码。
我写了一个小的 .h 和 .cpp,现在可以处理(可能)所有字节顺序问题。虽然我已经为我自己的应用程序调整了功能,但它们可能会对某些人有所帮助。
endian_bis.h:
/**
* endian_bis.h - endian-gnostic binary input stream functions
* Copyright (C) 2015
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*/
#pragma once
#include <cstdint>
#include <istream>
class BinaryInputStream {
public:
inline int8_t read_int8(std::istream &in) { char buf[1]; in.read(buf, 1); return read_int8(buf, 0); }
inline int16_t read_int16(std::istream &in) { char buf[2]; in.read(buf, 2); return read_int16(buf, 0); }
inline int32_t read_int32(std::istream &in) { char buf[4]; in.read(buf, 4); return read_int32(buf, 0); }
inline int64_t read_int64(std::istream &in) { char buf[8]; in.read(buf, 8); return read_int64(buf, 0); }
inline uint8_t read_uint8(std::istream &in) { char buf[1]; in.read(buf, 1); return read_uint8(buf, 0); }
inline uint16_t read_uint16(std::istream &in) { char buf[2]; in.read(buf, 2); return read_uint16(buf, 0); }
inline uint32_t read_uint32(std::istream &in) { char buf[4]; in.read(buf, 4); return read_uint32(buf, 0); }
inline uint64_t read_uint64(std::istream &in) { char buf[8]; in.read(buf, 8); return read_uint64(buf, 0); }
inline float read_float(std::istream &in) { char buf[4]; in.read(buf, 4); return read_float(buf, 0); }
inline double read_double(std::istream &in) { char buf[8]; in.read(buf, 8); return read_double(buf, 0); }
inline int8_t read_int8(char buf[], int off) { return (int8_t)buf[off]; }
inline uint8_t read_uint8(char buf[], int off) { return (uint8_t)buf[off]; }
virtual int16_t read_int16(char buf[], int off) = 0;
virtual int32_t read_int32(char buf[], int off) = 0;
virtual int64_t read_int64(char buf[], int off) = 0;
virtual uint16_t read_uint16(char buf[], int off) = 0;
virtual uint32_t read_uint32(char buf[], int off) = 0;
virtual uint64_t read_uint64(char buf[], int off) = 0;
virtual float read_float(char buf[], int off) = 0;
virtual double read_double(char buf[], int off) = 0;
static BinaryInputStream *endianCorrectStream(int streamIsBigEndian);
static BinaryInputStream *endianCorrectStream(std::istream &in,
uint32_t expectedBigEndianMagic,
uint32_t expectedLittleEndianMagic);
};
endian_bis.cpp:
/**
* endian_bis.cpp - endian-gnostic binary input stream functions
* Copyright (C) 2015 Jonah Schreiber (jonah.schreiber@gmail.com)
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*/
#include "endian_bis.h"
#include <cstring>
/*
* Delegated functions
*/
static inline int16_t read_be_int16(char buf[], int off) {
return (int16_t)(((buf[off] & 0xff) << 8) |
((buf[off+1] & 0xff)));
}
static inline int32_t read_be_int32(char buf[], int off) {
return (int32_t)(((buf[off] & 0xff) << 24) |
((buf[off+1] & 0xff) << 16) |
((buf[off+2] & 0xff) << 8) |
((buf[off+3] & 0xff)));
}
template<int> static inline int64_t read_be_int64(char buf[], int off); // template indicates default word size (size_t)
template<> inline int64_t read_be_int64<4>(char buf[], int off) {
return (((int64_t)(((buf[off] & 0xff) << 24) |
((buf[off+1] & 0xff) << 16) |
((buf[off+2] & 0xff) << 8) |
((buf[off+3] & 0xff)))
) << 32) | (
(int64_t)(((buf[off+4] & 0xff) << 24) |
((buf[off+5] & 0xff) << 16) |
((buf[off+6] & 0xff) << 8) |
((buf[off+7] & 0xff))));
}
static inline uint16_t read_be_uint16(char buf[], int off) {
return (uint16_t)(((buf[off] & 0xff) << 8) |
((buf[off+1] & 0xff)));
}
static inline uint32_t read_be_uint32(char buf[], int off) {
return (uint32_t)(((buf[off] & 0xff) << 24) |
((buf[off+1] & 0xff) << 16) |
((buf[off+2] & 0xff) << 8) |
((buf[off+3] & 0xff)));
}
template<int> static inline uint64_t read_be_uint64(char buf[], int off); // template indicates default word size (size_t)
template<> inline uint64_t read_be_uint64<4>(char buf[], int off) {
return (((uint64_t)(((buf[off] & 0xff) << 24) |
((buf[off+1] & 0xff) << 16) |
((buf[off+2] & 0xff) << 8) |
((buf[off+3] & 0xff)))
) << 32) | (
(uint64_t)(((buf[off+4] & 0xff) << 24) |
((buf[off+5] & 0xff) << 16) |
((buf[off+6] & 0xff) << 8) |
((buf[off+7] & 0xff))));
}
inline static int16_t read_le_int16(char buf[], int off) {
return (int16_t)(((buf[off+1] & 0xff) << 8) |
((buf[off] & 0xff)));
}
inline static int32_t read_le_int32(char buf[], int off) {
return (int32_t)(((buf[off+3] & 0xff) << 24) |
((buf[off+2] & 0xff) << 16) |
((buf[off+1] & 0xff) << 8) |
((buf[off] & 0xff)));
}
template<int> static inline int64_t read_le_int64(char buf[], int off); // template indicates default word size (size_t)
template<> inline int64_t read_le_int64<4>(char buf[], int off) {
return (((int64_t)(((buf[off+7] & 0xff) << 24) |
((buf[off+6] & 0xff) << 16) |
((buf[off+5] & 0xff) << 8) |
((buf[off+4] & 0xff)))
) << 32) | (
(int64_t)(((buf[off+3] & 0xff) << 24) |
((buf[off+2] & 0xff) << 16) |
((buf[off+1] & 0xff) << 8) |
((buf[off] & 0xff))));
}
inline static uint16_t read_le_uint16(char buf[], int off) {
return (uint16_t)(((buf[off+1] & 0xff) << 8) |
((buf[off] & 0xff)));
}
inline static uint32_t read_le_uint32(char buf[], int off) {
return (uint32_t)(((buf[off+3] & 0xff) << 24) |
((buf[off+2] & 0xff) << 16) |
((buf[off+1] & 0xff) << 8) |
((buf[off] & 0xff)));
}
template<int> static inline uint64_t read_le_uint64(char buf[], int off); // template indicates default word size (size_t)
template<> inline uint64_t read_le_uint64<4>(char buf[], int off) {
return (((uint64_t)(((buf[off+7] & 0xff) << 24) |
((buf[off+6] & 0xff) << 16) |
((buf[off+5] & 0xff)<< 8) |
((buf[off+4] & 0xff)))
) << 32) | (
(uint64_t)(((buf[off+3] & 0xff) << 24) |
((buf[off+2] & 0xff) << 16) |
((buf[off+1] & 0xff) << 8) |
((buf[off] & 0xff))));
}
/* WARNING: UNTESTED FOR 64 BIT ARCHITECTURES; FILL IN 3 MORE METHODS LIKE THIS TO TEST
THE CORRECT FUNCTION WILL BE SELECTED AUTOMATICALLY AT COMPILE TIME
template<> inline uint64_t read_uint64_branch<8>(char buf[], int off) {
return (int64_t)((buf[off] << 56) |
(buf[off+1] << 48) |
(buf[off+2] << 40) |
(buf[off+3] << 32) |
(buf[off+4] << 24) |
(buf[off+5] << 16) |
(buf[off+6] << 8) |
(buf[off+7]));
}*/
inline static float read_matching_float(char buf[], int off) {
float f;
memcpy(&f, &buf[off], 4);
return f;
}
inline static float read_mismatched_float(char buf[], int off) {
float f;
char buf2[4] = {buf[3], buf[2], buf[1], buf[0]};
memcpy(&f, buf2, 4);
return f;
}
inline static double read_matching_double(char buf[], int off) {
double d;
memcpy(&d, &buf[off], 8);
return d;
}
inline static double read_mismatched_double(char buf[], int off) {
double d;
char buf2[8] = {buf[7], buf[6], buf[5], buf[4], buf[3], buf[2], buf[1], buf[0]};
memcpy(&d, buf2, 4);
return d;
}
/*
* Types (singleton instantiations)
*/
/*
* Big-endian stream, Big-endian runtime
*/
static class : public BinaryInputStream {
public:
int16_t read_int16(char buf[], int off) { return read_be_int16(buf, off); }
int32_t read_int32(char buf[], int off) { return read_be_int32(buf, off); }
int64_t read_int64(char buf[], int off) { return read_be_int64<sizeof(size_t)>(buf, off); }
uint16_t read_uint16(char buf[], int off) { return read_be_uint16(buf, off); }
uint32_t read_uint32(char buf[], int off) { return read_be_uint32(buf, off); }
uint64_t read_uint64(char buf[], int off) { return read_be_uint64<sizeof(size_t)>(buf, off); }
float read_float(char buf[], int off) { return read_matching_float(buf, off); }
double read_double(char buf[], int off) { return read_matching_double(buf, off); }
} beStreamBeRuntime;
/*
* Big-endian stream, Little-endian runtime
*/
static class : public BinaryInputStream {
public:
int16_t read_int16(char buf[], int off) { return read_be_int16(buf, off); }
int32_t read_int32(char buf[], int off) { return read_be_int32(buf, off); }
int64_t read_int64(char buf[], int off) { return read_be_int64<sizeof(size_t)>(buf, off); }
uint16_t read_uint16(char buf[], int off) { return read_be_uint16(buf, off); }
uint32_t read_uint32(char buf[], int off) { return read_be_uint32(buf, off); }
uint64_t read_uint64(char buf[], int off) { return read_be_uint64<sizeof(size_t)>(buf, off); }
float read_float(char buf[], int off) { return read_mismatched_float(buf, off); }
double read_double(char buf[], int off) { return read_mismatched_double(buf, off); }
} beStreamLeRuntime;
/*
* Little-endian stream, Big-endian runtime
*/
static class : public BinaryInputStream {
public:
int16_t read_int16(char buf[], int off) { return read_le_int16(buf, off); }
int32_t read_int32(char buf[], int off) { return read_le_int32(buf, off); }
int64_t read_int64(char buf[], int off) { return read_le_int64<sizeof(size_t)>(buf, off); }
uint16_t read_uint16(char buf[], int off) { return read_le_uint16(buf, off); }
uint32_t read_uint32(char buf[], int off) { return read_le_uint32(buf, off); }
uint64_t read_uint64(char buf[], int off) { return read_le_uint64<sizeof(size_t)>(buf, off); }
float read_float(char buf[], int off) { return read_mismatched_float(buf, off); }
double read_double(char buf[], int off) { return read_mismatched_double(buf, off); }
} leStreamBeRuntime;
/*
* Little-endian stream, Little-endian runtime
*/
static class : public BinaryInputStream {
public:
int16_t read_int16(char buf[], int off) { return read_le_int16(buf, off); }
int32_t read_int32(char buf[], int off) { return read_le_int32(buf, off); }
int64_t read_int64(char buf[], int off) { return read_le_int64<sizeof(size_t)>(buf, off); }
uint16_t read_uint16(char buf[], int off) { return read_le_uint16(buf, off); }
uint32_t read_uint32(char buf[], int off) { return read_le_uint32(buf, off); }
uint64_t read_uint64(char buf[], int off) { return read_le_uint64<sizeof(size_t)>(buf, off); }
float read_float(char buf[], int off) { return read_matching_float(buf, off); }
double read_double(char buf[], int off) { return read_matching_double(buf, off); }
} leStreamLeRuntime;
/*
* "Factory" singleton methods (plus helper)
*/
static inline int isRuntimeBigEndian() {
union { int32_t i; int8_t c[4]; } bint = {0x01020304};
return bint.c[0] == 1;
}
BinaryInputStream *BinaryInputStream::endianCorrectStream(int streamIsBigEndian) {
if (streamIsBigEndian) {
if (isRuntimeBigEndian()) {
return &beStreamBeRuntime;
} else {
return &beStreamLeRuntime;
}
} else {
if (isRuntimeBigEndian()) {
return &leStreamBeRuntime;
} else {
return &leStreamLeRuntime;
}
}
}
BinaryInputStream *BinaryInputStream::endianCorrectStream(std::istream &in,
uint32_t expectedBigEndianMagic,
uint32_t expectedLittleEndianMagic) {
uint32_t magic = ((BinaryInputStream*)&beStreamBeRuntime)->read_uint32(in);
if (magic == expectedBigEndianMagic) {
if (isRuntimeBigEndian()) {
return &beStreamBeRuntime;
} else {
return &beStreamLeRuntime;
}
} else if (magic == expectedLittleEndianMagic) {
if (isRuntimeBigEndian()) {
return &leStreamBeRuntime;
} else {
return &leStreamLeRuntime;
}
} else {
return 0; /* not expected magic number */
}
}
建议使用:
BinaryInputStream *bis = BinaryInputStream::endianCorrectStream(in, 0x01020304, 0x04030201);
if (bis == 0) {
cerr << "error: infile is not an Acme EarthQUAKEZ file" << endl;
return 1;
}
in.ignore(4);
int32_t number = bis->read_int32(in);
...
因此 dasblinkenlight 提出的虚函数方法可能就足够了 - 特别是因为 I/O 可能是时间的主要消耗者。然而,如果你 do 发现你的读取函数占用了很多 cpu 时间,你可以通过模板化你的文件 reader.
这里有一些伪代码演示了这一点:
基本上,创建两个 reader 类,每个字节顺序一个:
class LittleReader {
public:
LittleReader(std::istream& is) : m_is(is) {}
char read_char() {//read byte from m_is}
int read_int32() {//read 32-bit int and convert;}
float read_float()....
private:
std::istream& m_is;
};
class BigReader {
public:
BigReader(std::istream& is): m_is(is){}
char read_char(){...}
int read_int32(){..}
float read_float(){...}
private:
std::istream& m_is;
}
将阅读逻辑的主要部分(幻数位除外)分离到一个函数模板中,该模板采用上述 类 之一的实例作为参数:
template <class Reader>
void read_endian(Reader &rdr){
field1 = rdr.read_int32();
field2 = rdr.read_float();
// process rest of data file
...
}
本质上,编译器将为您的 read_endian 函数创建两个实现 - 每个字节序各一个。由于没有动态调度,编译器也可以内联所有对read_int32、read_float等的调用
最后,在您的 reader 主函数中,查看幻数以确定实例化哪种 reader:
void read_file(std::istream& is){
int magic(read_magic_no(is));
if (magic == MAGIC_BIG_ENDIAN)
read_endian(BigReader(is));
else
read_endian(LittleReader(is));
}
此技术以增加(二进制)代码大小为代价,在不产生任何虚拟分派开销的情况下为您提供了灵活性。当你有非常紧密的循环并且你需要挤压每一滴性能时,它会非常有用。
我想编写一些 C++ 代码,以正确的字节序方式从文件中读取和写入。更准确地说,我希望能够读取特定类型的文件,我可以轻松检测到其字节序(它的幻数是否被反转)。
但是我该如何正确地干净地读取文件呢?我阅读了以下文章,其中提供了一个有用的想法:
我们的想法是制作一个 class,其中有一些函数指针指向所需的字节顺序正确的 read() 函数。但根据我的经验,函数指针很慢,尤其是当你必须像本例那样频繁地调用它们时。另一种选择是
if (file_was_detected_big_endian) { read_bigendian(); } else { read_littleendian(); }
对于我拥有的每个 read_x_bit_int() 函数,但这似乎也很低效。
我正在使用 Boost,所以我可以使用它的所有功能来帮助我。特别是,有 endian 子库:
http://www.boost.org/doc/libs/develop/libs/endian/doc/buffers.html
虽然我不确定如何才能干净利落地使用这段代码来做我想做的事。我希望有一些代码,我可以在其中直接读取 16 个字节到代表文件一部分的 struct
的指针中,同时自动更正字节序。我当然可以自己编写这段代码,但我觉得一定已经存在可靠的解决方案。
我想我所有的代码都会被手动填充并防止对齐问题。
谢谢!
这个问题有两种解决方法:
- 以字节序不可知的方式编写文件,并且
- 添加标记,并以字节序感知方式读取文件。
第一种方法需要在写作上做更多的工作,而第二种方法使写作变得"overhead-free"。
这两种方法都可以在没有函数指针的情况下实现:由于虚函数的存在,C++ 中对它们的需求已大大减少*.
这两种方法的实现是相似的:你需要创建一个抽象基础 class 来序列化原始数据类型,创建一个 class 的实例来读取正确的字节序,并调用它的虚拟成员读写函数:
struct PrimitiveSerializer {
virtual void serializeInt(ostream& out, const int val) = 0;
virtual void serializeChar(ostream& out, const char val) = 0;
virtual void serializeString(ostream& out, const std::string& val) = 0;
...
virtual int deserializeInt(istream& in) = 0;
virtual char deserializeChar(istream& in) = 0;
virtual std::string deserializeString(istream& in) = 0;
};
struct BigEndianSerializer : public PrimitiveSerializer {
...
};
struct LittleEndianSerializer : public PrimitiveSerializer {
...
};
根据方法的不同,使用哪个 subclass 的决定也不同。如果您使用第一种方法(即编写与字节序无关的文件),那么您将实例化与您系统的字节序相匹配的序列化程序。如果你采用第二种方法,你将从文件中读取幻数,然后选择与你的文件的字节顺序相匹配的 subclass。
此外,第一种方法可以使用 hton
/ ntoh
函数来实现。
* 函数指针本身并不是 "slow",尽管它们更容易编写低效代码。
我写了一个小的 .h 和 .cpp,现在可以处理(可能)所有字节顺序问题。虽然我已经为我自己的应用程序调整了功能,但它们可能会对某些人有所帮助。
endian_bis.h:
/**
* endian_bis.h - endian-gnostic binary input stream functions
* Copyright (C) 2015
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*/
#pragma once
#include <cstdint>
#include <istream>
class BinaryInputStream {
public:
inline int8_t read_int8(std::istream &in) { char buf[1]; in.read(buf, 1); return read_int8(buf, 0); }
inline int16_t read_int16(std::istream &in) { char buf[2]; in.read(buf, 2); return read_int16(buf, 0); }
inline int32_t read_int32(std::istream &in) { char buf[4]; in.read(buf, 4); return read_int32(buf, 0); }
inline int64_t read_int64(std::istream &in) { char buf[8]; in.read(buf, 8); return read_int64(buf, 0); }
inline uint8_t read_uint8(std::istream &in) { char buf[1]; in.read(buf, 1); return read_uint8(buf, 0); }
inline uint16_t read_uint16(std::istream &in) { char buf[2]; in.read(buf, 2); return read_uint16(buf, 0); }
inline uint32_t read_uint32(std::istream &in) { char buf[4]; in.read(buf, 4); return read_uint32(buf, 0); }
inline uint64_t read_uint64(std::istream &in) { char buf[8]; in.read(buf, 8); return read_uint64(buf, 0); }
inline float read_float(std::istream &in) { char buf[4]; in.read(buf, 4); return read_float(buf, 0); }
inline double read_double(std::istream &in) { char buf[8]; in.read(buf, 8); return read_double(buf, 0); }
inline int8_t read_int8(char buf[], int off) { return (int8_t)buf[off]; }
inline uint8_t read_uint8(char buf[], int off) { return (uint8_t)buf[off]; }
virtual int16_t read_int16(char buf[], int off) = 0;
virtual int32_t read_int32(char buf[], int off) = 0;
virtual int64_t read_int64(char buf[], int off) = 0;
virtual uint16_t read_uint16(char buf[], int off) = 0;
virtual uint32_t read_uint32(char buf[], int off) = 0;
virtual uint64_t read_uint64(char buf[], int off) = 0;
virtual float read_float(char buf[], int off) = 0;
virtual double read_double(char buf[], int off) = 0;
static BinaryInputStream *endianCorrectStream(int streamIsBigEndian);
static BinaryInputStream *endianCorrectStream(std::istream &in,
uint32_t expectedBigEndianMagic,
uint32_t expectedLittleEndianMagic);
};
endian_bis.cpp:
/**
* endian_bis.cpp - endian-gnostic binary input stream functions
* Copyright (C) 2015 Jonah Schreiber (jonah.schreiber@gmail.com)
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*/
#include "endian_bis.h"
#include <cstring>
/*
* Delegated functions
*/
static inline int16_t read_be_int16(char buf[], int off) {
return (int16_t)(((buf[off] & 0xff) << 8) |
((buf[off+1] & 0xff)));
}
static inline int32_t read_be_int32(char buf[], int off) {
return (int32_t)(((buf[off] & 0xff) << 24) |
((buf[off+1] & 0xff) << 16) |
((buf[off+2] & 0xff) << 8) |
((buf[off+3] & 0xff)));
}
template<int> static inline int64_t read_be_int64(char buf[], int off); // template indicates default word size (size_t)
template<> inline int64_t read_be_int64<4>(char buf[], int off) {
return (((int64_t)(((buf[off] & 0xff) << 24) |
((buf[off+1] & 0xff) << 16) |
((buf[off+2] & 0xff) << 8) |
((buf[off+3] & 0xff)))
) << 32) | (
(int64_t)(((buf[off+4] & 0xff) << 24) |
((buf[off+5] & 0xff) << 16) |
((buf[off+6] & 0xff) << 8) |
((buf[off+7] & 0xff))));
}
static inline uint16_t read_be_uint16(char buf[], int off) {
return (uint16_t)(((buf[off] & 0xff) << 8) |
((buf[off+1] & 0xff)));
}
static inline uint32_t read_be_uint32(char buf[], int off) {
return (uint32_t)(((buf[off] & 0xff) << 24) |
((buf[off+1] & 0xff) << 16) |
((buf[off+2] & 0xff) << 8) |
((buf[off+3] & 0xff)));
}
template<int> static inline uint64_t read_be_uint64(char buf[], int off); // template indicates default word size (size_t)
template<> inline uint64_t read_be_uint64<4>(char buf[], int off) {
return (((uint64_t)(((buf[off] & 0xff) << 24) |
((buf[off+1] & 0xff) << 16) |
((buf[off+2] & 0xff) << 8) |
((buf[off+3] & 0xff)))
) << 32) | (
(uint64_t)(((buf[off+4] & 0xff) << 24) |
((buf[off+5] & 0xff) << 16) |
((buf[off+6] & 0xff) << 8) |
((buf[off+7] & 0xff))));
}
inline static int16_t read_le_int16(char buf[], int off) {
return (int16_t)(((buf[off+1] & 0xff) << 8) |
((buf[off] & 0xff)));
}
inline static int32_t read_le_int32(char buf[], int off) {
return (int32_t)(((buf[off+3] & 0xff) << 24) |
((buf[off+2] & 0xff) << 16) |
((buf[off+1] & 0xff) << 8) |
((buf[off] & 0xff)));
}
template<int> static inline int64_t read_le_int64(char buf[], int off); // template indicates default word size (size_t)
template<> inline int64_t read_le_int64<4>(char buf[], int off) {
return (((int64_t)(((buf[off+7] & 0xff) << 24) |
((buf[off+6] & 0xff) << 16) |
((buf[off+5] & 0xff) << 8) |
((buf[off+4] & 0xff)))
) << 32) | (
(int64_t)(((buf[off+3] & 0xff) << 24) |
((buf[off+2] & 0xff) << 16) |
((buf[off+1] & 0xff) << 8) |
((buf[off] & 0xff))));
}
inline static uint16_t read_le_uint16(char buf[], int off) {
return (uint16_t)(((buf[off+1] & 0xff) << 8) |
((buf[off] & 0xff)));
}
inline static uint32_t read_le_uint32(char buf[], int off) {
return (uint32_t)(((buf[off+3] & 0xff) << 24) |
((buf[off+2] & 0xff) << 16) |
((buf[off+1] & 0xff) << 8) |
((buf[off] & 0xff)));
}
template<int> static inline uint64_t read_le_uint64(char buf[], int off); // template indicates default word size (size_t)
template<> inline uint64_t read_le_uint64<4>(char buf[], int off) {
return (((uint64_t)(((buf[off+7] & 0xff) << 24) |
((buf[off+6] & 0xff) << 16) |
((buf[off+5] & 0xff)<< 8) |
((buf[off+4] & 0xff)))
) << 32) | (
(uint64_t)(((buf[off+3] & 0xff) << 24) |
((buf[off+2] & 0xff) << 16) |
((buf[off+1] & 0xff) << 8) |
((buf[off] & 0xff))));
}
/* WARNING: UNTESTED FOR 64 BIT ARCHITECTURES; FILL IN 3 MORE METHODS LIKE THIS TO TEST
THE CORRECT FUNCTION WILL BE SELECTED AUTOMATICALLY AT COMPILE TIME
template<> inline uint64_t read_uint64_branch<8>(char buf[], int off) {
return (int64_t)((buf[off] << 56) |
(buf[off+1] << 48) |
(buf[off+2] << 40) |
(buf[off+3] << 32) |
(buf[off+4] << 24) |
(buf[off+5] << 16) |
(buf[off+6] << 8) |
(buf[off+7]));
}*/
inline static float read_matching_float(char buf[], int off) {
float f;
memcpy(&f, &buf[off], 4);
return f;
}
inline static float read_mismatched_float(char buf[], int off) {
float f;
char buf2[4] = {buf[3], buf[2], buf[1], buf[0]};
memcpy(&f, buf2, 4);
return f;
}
inline static double read_matching_double(char buf[], int off) {
double d;
memcpy(&d, &buf[off], 8);
return d;
}
inline static double read_mismatched_double(char buf[], int off) {
double d;
char buf2[8] = {buf[7], buf[6], buf[5], buf[4], buf[3], buf[2], buf[1], buf[0]};
memcpy(&d, buf2, 4);
return d;
}
/*
* Types (singleton instantiations)
*/
/*
* Big-endian stream, Big-endian runtime
*/
static class : public BinaryInputStream {
public:
int16_t read_int16(char buf[], int off) { return read_be_int16(buf, off); }
int32_t read_int32(char buf[], int off) { return read_be_int32(buf, off); }
int64_t read_int64(char buf[], int off) { return read_be_int64<sizeof(size_t)>(buf, off); }
uint16_t read_uint16(char buf[], int off) { return read_be_uint16(buf, off); }
uint32_t read_uint32(char buf[], int off) { return read_be_uint32(buf, off); }
uint64_t read_uint64(char buf[], int off) { return read_be_uint64<sizeof(size_t)>(buf, off); }
float read_float(char buf[], int off) { return read_matching_float(buf, off); }
double read_double(char buf[], int off) { return read_matching_double(buf, off); }
} beStreamBeRuntime;
/*
* Big-endian stream, Little-endian runtime
*/
static class : public BinaryInputStream {
public:
int16_t read_int16(char buf[], int off) { return read_be_int16(buf, off); }
int32_t read_int32(char buf[], int off) { return read_be_int32(buf, off); }
int64_t read_int64(char buf[], int off) { return read_be_int64<sizeof(size_t)>(buf, off); }
uint16_t read_uint16(char buf[], int off) { return read_be_uint16(buf, off); }
uint32_t read_uint32(char buf[], int off) { return read_be_uint32(buf, off); }
uint64_t read_uint64(char buf[], int off) { return read_be_uint64<sizeof(size_t)>(buf, off); }
float read_float(char buf[], int off) { return read_mismatched_float(buf, off); }
double read_double(char buf[], int off) { return read_mismatched_double(buf, off); }
} beStreamLeRuntime;
/*
* Little-endian stream, Big-endian runtime
*/
static class : public BinaryInputStream {
public:
int16_t read_int16(char buf[], int off) { return read_le_int16(buf, off); }
int32_t read_int32(char buf[], int off) { return read_le_int32(buf, off); }
int64_t read_int64(char buf[], int off) { return read_le_int64<sizeof(size_t)>(buf, off); }
uint16_t read_uint16(char buf[], int off) { return read_le_uint16(buf, off); }
uint32_t read_uint32(char buf[], int off) { return read_le_uint32(buf, off); }
uint64_t read_uint64(char buf[], int off) { return read_le_uint64<sizeof(size_t)>(buf, off); }
float read_float(char buf[], int off) { return read_mismatched_float(buf, off); }
double read_double(char buf[], int off) { return read_mismatched_double(buf, off); }
} leStreamBeRuntime;
/*
* Little-endian stream, Little-endian runtime
*/
static class : public BinaryInputStream {
public:
int16_t read_int16(char buf[], int off) { return read_le_int16(buf, off); }
int32_t read_int32(char buf[], int off) { return read_le_int32(buf, off); }
int64_t read_int64(char buf[], int off) { return read_le_int64<sizeof(size_t)>(buf, off); }
uint16_t read_uint16(char buf[], int off) { return read_le_uint16(buf, off); }
uint32_t read_uint32(char buf[], int off) { return read_le_uint32(buf, off); }
uint64_t read_uint64(char buf[], int off) { return read_le_uint64<sizeof(size_t)>(buf, off); }
float read_float(char buf[], int off) { return read_matching_float(buf, off); }
double read_double(char buf[], int off) { return read_matching_double(buf, off); }
} leStreamLeRuntime;
/*
* "Factory" singleton methods (plus helper)
*/
static inline int isRuntimeBigEndian() {
union { int32_t i; int8_t c[4]; } bint = {0x01020304};
return bint.c[0] == 1;
}
BinaryInputStream *BinaryInputStream::endianCorrectStream(int streamIsBigEndian) {
if (streamIsBigEndian) {
if (isRuntimeBigEndian()) {
return &beStreamBeRuntime;
} else {
return &beStreamLeRuntime;
}
} else {
if (isRuntimeBigEndian()) {
return &leStreamBeRuntime;
} else {
return &leStreamLeRuntime;
}
}
}
BinaryInputStream *BinaryInputStream::endianCorrectStream(std::istream &in,
uint32_t expectedBigEndianMagic,
uint32_t expectedLittleEndianMagic) {
uint32_t magic = ((BinaryInputStream*)&beStreamBeRuntime)->read_uint32(in);
if (magic == expectedBigEndianMagic) {
if (isRuntimeBigEndian()) {
return &beStreamBeRuntime;
} else {
return &beStreamLeRuntime;
}
} else if (magic == expectedLittleEndianMagic) {
if (isRuntimeBigEndian()) {
return &leStreamBeRuntime;
} else {
return &leStreamLeRuntime;
}
} else {
return 0; /* not expected magic number */
}
}
建议使用:
BinaryInputStream *bis = BinaryInputStream::endianCorrectStream(in, 0x01020304, 0x04030201);
if (bis == 0) {
cerr << "error: infile is not an Acme EarthQUAKEZ file" << endl;
return 1;
}
in.ignore(4);
int32_t number = bis->read_int32(in);
...
因此 dasblinkenlight 提出的虚函数方法可能就足够了 - 特别是因为 I/O 可能是时间的主要消耗者。然而,如果你 do 发现你的读取函数占用了很多 cpu 时间,你可以通过模板化你的文件 reader.
这里有一些伪代码演示了这一点:
基本上,创建两个 reader 类,每个字节顺序一个:
class LittleReader {
public:
LittleReader(std::istream& is) : m_is(is) {}
char read_char() {//read byte from m_is}
int read_int32() {//read 32-bit int and convert;}
float read_float()....
private:
std::istream& m_is;
};
class BigReader {
public:
BigReader(std::istream& is): m_is(is){}
char read_char(){...}
int read_int32(){..}
float read_float(){...}
private:
std::istream& m_is;
}
将阅读逻辑的主要部分(幻数位除外)分离到一个函数模板中,该模板采用上述 类 之一的实例作为参数:
template <class Reader>
void read_endian(Reader &rdr){
field1 = rdr.read_int32();
field2 = rdr.read_float();
// process rest of data file
...
}
本质上,编译器将为您的 read_endian 函数创建两个实现 - 每个字节序各一个。由于没有动态调度,编译器也可以内联所有对read_int32、read_float等的调用
最后,在您的 reader 主函数中,查看幻数以确定实例化哪种 reader:
void read_file(std::istream& is){
int magic(read_magic_no(is));
if (magic == MAGIC_BIG_ENDIAN)
read_endian(BigReader(is));
else
read_endian(LittleReader(is));
}
此技术以增加(二进制)代码大小为代价,在不产生任何虚拟分派开销的情况下为您提供了灵活性。当你有非常紧密的循环并且你需要挤压每一滴性能时,它会非常有用。