C++ 从二进制文件中读回 "incorrect" 值?
C++ Reading back "incorrect" values from binary file?
我正在处理的项目,作为一种自定义文件格式,由 header 几个不同的变量组成,后跟像素数据。我的同事开发了一个 GUI,处理、写入、读取和显示这种类型的文件格式都可以正常工作。
但我的问题是,虽然我协助编写了将数据写入磁盘的代码,但我无法自己读取这种文件并获得令人满意的值。我能够读回第一个变量(字符数组)但不能读出以下值。
因此文件格式符合以下结构:
typedef struct {
char hxtLabel[8];
u64 hxtVersion;
int motorPositions[9];
int filePrefixLength;
char filePrefix[100];
..
} HxtBuffer;
在代码中,我创建了一个上述结构的 object,然后设置这些示例值:
setLabel("MY_LABEL");
setFormatVersion(3);
setMotorPosition( 2109, 5438, 8767, 1234, 1022, 1033, 1044, 1055, 1066);
setFilePrefixLength(7);
setFilePrefix( string("prefix_"));
setDataTimeStamp( string("000000_000000"));
我打开文件的代码:
// Open data file, binary mode, reading
ifstream datFile(aFileName.c_str(), ios::in | ios::binary);
if (!datFile.is_open()) {
cout << "readFile() ERROR: Failed to open file " << aFileName << endl;
return false;
}
// How large is the file?
datFile.seekg(0, datFile.end);
int length = datFile.tellg();
datFile.seekg(0, datFile.beg);
cout << "readFile() file " << setw(70) << aFileName << " is: " << setw(15) << length << " long\n";
// Allocate memory for buffer:
char * buffer = new char[length];
// Read data as one block:
datFile.read(buffer, length);
datFile.close();
/// Looking at the start of the buffer, I should be seeing "MY_LABEL"?
cout << "buffer: " << buffer << " " << *(buffer) << endl;
int* mSSX = reinterpret_cast<int*>(*(buffer+8));
int* mSSY = reinterpret_cast<int*>(&buffer+9);
int* mSSZ = reinterpret_cast<int*>(&buffer+10);
int* mSSROT = reinterpret_cast<int*>(&buffer+11);
int* mTimer = reinterpret_cast<int*>(&buffer+12);
int* mGALX = reinterpret_cast<int*>(&buffer+13);
int* mGALY = reinterpret_cast<int*>(&buffer+14);
int* mGALZ = reinterpret_cast<int*>(&buffer+15);
int* mGALROT = reinterpret_cast<int*>(&buffer+16);
int* filePrefixLength = reinterpret_cast<int*>(&buffer+17);
std::string filePrefix; std::string dataTimeStamp;
// Read file prefix character by character into stringstream object
std::stringstream ss;
char* cPointer = (char *)(buffer+18);
int k;
for(k = 0; k < *filePrefixLength; k++)
{
//read string
char c;
c = *cPointer;
ss << c;
cPointer++;
}
filePrefix = ss.str();
// Read timestamp character by character into stringstream object
std::stringstream timeStampStream;
/// Need not increment cPointer, already pointing @ 1st char of timeStamp
for (int l= 0; l < 13; l++)
{
char c;
c = * cPointer;
timeStampStream << c;
}
dataTimeStamp = timeStampStream.str();
cout << 25 << endl;
cout << " mSSX: " << mSSX << " mSSY: " << mSSY << " mSSZ: " << mSSZ;
cout << " mSSROT: " << mSSROT << " mTimer: " << mTimer << " mGALX: " << mGALX;
cout << " mGALY: " << mGALY << " mGALZ: " << mGALZ << " mGALROT: " << mGALROT;
最后,我看到的是下面这个。我添加 25 只是为了仔细检查并非所有内容都是十六进制的。如您所见,我能够按预期看到标签 "MY_LABEL"。但是 9 个 motorPositions 都看起来可疑地看起来地址不是值。文件前缀和数据时间戳(应该是字符串,或者至少是字符)都是空的。
buffer: MY_LABEL M
25
mSSX: 0000000000000003 mSSY: 00000000001BF618 mSSZ: 00000000001BF620 mSSROT: 00000000001BF628 mTimer: 00000000001BF630 mGALX: 00000000001BF638 mGALY: 00000000001BF640 mGALZ: 00000000001BF648 mGALROT: 00000000001BF650filePrefix: dataTimeStamp:
我确定解决方案不会太复杂,但我到了一个阶段,我只是在旋转,无法理解事情。
非常感谢阅读这篇有点长的文章post。
-- 编辑--
我可能会达到 post 允许的最大长度,但以防万一我认为我应该 post 生成我正在尝试读回的数据的代码:
bool writePixelOutput(string aOutputPixelFileName) {
// Write pixel histograms out to binary file
ofstream pixelFile;
pixelFile.open(aOutputPixelFileName.c_str(), ios::binary | ios::out | ios::trunc);
if (!pixelFile.is_open()) {
LOG(gLogConfig, logERROR) << "Failed to open output file " << aOutputPixelFileName;
return false;
}
// Write binary file header
string label("MY_LABEL");
pixelFile.write(label.c_str(), label.length());
pixelFile.write((const char*)&mFormatVersion, sizeof(u64));
// Include File Prefix/Motor Positions/Data Time Stamp - if format version > 1
if (mFormatVersion > 1)
{
pixelFile.write((const char*)&mSSX, sizeof(mSSX));
pixelFile.write((const char*)&mSSY, sizeof(mSSY));
pixelFile.write((const char*)&mSSZ, sizeof(mSSZ));
pixelFile.write((const char*)&mSSROT, sizeof(mSSROT));
pixelFile.write((const char*)&mTimer, sizeof(mTimer));
pixelFile.write((const char*)&mGALX, sizeof(mGALX));
pixelFile.write((const char*)&mGALY, sizeof(mGALY));
pixelFile.write((const char*)&mGALZ, sizeof(mGALZ));
pixelFile.write((const char*)&mGALROT, sizeof(mGALROT));
// Determine length of mFilePrefix string
int filePrefixSize = (int)mFilePrefix.size();
// Write prefix length, followed by prefix itself
pixelFile.write((const char*)&filePrefixSize, sizeof(filePrefixSize));
size_t prefixLen = 0;
if (mFormatVersion == 2) prefixLen = mFilePrefix.size();
else prefixLen = 100;
pixelFile.write(mFilePrefix.c_str(), prefixLen);
pixelFile.write(mDataTimeStamp.c_str(), mDataTimeStamp.size());
}
// Continue writing header information that is common to both format versions
pixelFile.write((const char*)&mRows, sizeof(mRows));
pixelFile.write((const char*)&mCols, sizeof(mCols));
pixelFile.write((const char*)&mHistoBins, sizeof(mHistoBins));
// Write the actual data - taken out for briefy sake
// ..
pixelFile.close();
LOG(gLogConfig, logINFO) << "Written output histogram binary file " << aOutputPixelFileName;
return true;
}
-- 编辑 2 (11:32 09/12/2015) --
感谢您的所有帮助,我现在离解决问题更近了。根据 muelleth 的回答,我尝试:
/// Read into char buffer
char * buffer = new char[length];
datFile.read(buffer, length);// length determined by ifstream.seekg()
/// Let's try HxtBuffer
HxtBuffer *input = new HxtBuffer;
cout << "sizeof HxtBuffer: " << sizeof *input << endl;
memcpy(input, buffer, length);
然后我可以显示不同的结构变量:
qDebug() << "Slice BUFFER label " << QString::fromStdString(input->hxtLabel);
qDebug() << "Slice BUFFER version " << QString::number(input->hxtVersion);
qDebug() << "Slice BUFFER hxtPrefixLength " << QString::number(input->filePrefixLength);
for (int i = 0; i < 9; i++)
{
qDebug() << i << QString::number(input->motorPositions[i]);
}
qDebug() << "Slice BUFFER filePrefix " << QString::fromStdString(input->filePrefix);
qDebug() << "Slice BUFFER dataTimeStamp " << QString::fromStdString(input->dataTimeStamp);
qDebug() << "Slice BUFFER nRows " << QString::number(input->nRows);
qDebug() << "Slice BUFFER nCols " << QString::number(input->nCols);
qDebug() << "Slice BUFFER nBins " << QString::number(input->nBins);
然后输出基本符合预期:
Slice BUFFER label "MY_LABEL"
Slice BUFFER version "3"
Slice BUFFER hxtPrefixLength "2"
0 "2109"
1 "5438"
...
7 "1055"
8 "1066"
Slice BUFFER filePrefix "-1"
Slice BUFFER dataTimeStamp "000000_000000P"
Slice BUFFER nRows "20480"
Slice BUFFER nCols "256000"
Slice BUFFER nBins "0"
EXCEPT,dataTimeStamp
,长度为 13 个字符,显示为 14 个字符。随后的 3 个变量:nRows
、nCols
和 nBins
则不正确。 (应该是 nRows=80,nCols=80,nBins=1000)。我的猜测是属于 dataTimeStamp
的第 14 个字符的位应该与 nRows
一起读取,因此级联以产生正确的 nCols
和 nBins
。
我已经使用 qDebug 单独验证(此处未显示)我写入文件的内容确实是我期望的值及其各自的大小。
我个人会尝试从文件中准确读取结构的字节数,即
int length = sizeof(HxtBuffer);
然后简单地使用 memcpy 从读取缓冲区分配一个本地结构:
HxtBuffer input;
memcpy(&input, buffer, length);
然后您可以访问您的数据,例如喜欢:
std::cout << "Data: " << input.hxtLabel << std::endl;
为什么读到缓冲,而不是使用结构体读?
HxtBuffer data;
datFile.read(reinterpret_cast<char *>(&data), sizeof data);
if(datFile && datFile.gcount()!=sizeof data)
throw io_exception();
// Can use data.
如果您想读取字符缓冲区,那么您获取数据的方式就是错误的。你可能想做这样的事情。
char *buf_offset=buffer+8+sizeof(u64); // Skip label (8 chars) and version (int64)
int mSSX = *reinterpret_cast<int*>(buf_offset);
buf_offset+=sizeof(int);
int mSSY = *reinterpret_cast<int*>(buf_offset);
buf_offset+=sizeof(int);
int mSSZ = *reinterpret_cast<int*>(buf_offset);
/* etc. */
或者,更好一点(前提是您不更改缓冲区的内容)。
int *ptr_motors=reinterpret_cast<int *>(buffer+8+sizeof(u64));
int &mSSX = ptr_motors[0];
int &mSSY = ptr_motors[1];
int &mSSZ = ptr_motors[2];
/* etc. */
请注意,我没有将 mSSX
、mSSY
等声明为指针。您的代码将它们打印为地址,因为 you 告诉编译器它们是地址(指针)。
我正在处理的项目,作为一种自定义文件格式,由 header 几个不同的变量组成,后跟像素数据。我的同事开发了一个 GUI,处理、写入、读取和显示这种类型的文件格式都可以正常工作。
但我的问题是,虽然我协助编写了将数据写入磁盘的代码,但我无法自己读取这种文件并获得令人满意的值。我能够读回第一个变量(字符数组)但不能读出以下值。
因此文件格式符合以下结构:
typedef struct {
char hxtLabel[8];
u64 hxtVersion;
int motorPositions[9];
int filePrefixLength;
char filePrefix[100];
..
} HxtBuffer;
在代码中,我创建了一个上述结构的 object,然后设置这些示例值:
setLabel("MY_LABEL");
setFormatVersion(3);
setMotorPosition( 2109, 5438, 8767, 1234, 1022, 1033, 1044, 1055, 1066);
setFilePrefixLength(7);
setFilePrefix( string("prefix_"));
setDataTimeStamp( string("000000_000000"));
我打开文件的代码:
// Open data file, binary mode, reading
ifstream datFile(aFileName.c_str(), ios::in | ios::binary);
if (!datFile.is_open()) {
cout << "readFile() ERROR: Failed to open file " << aFileName << endl;
return false;
}
// How large is the file?
datFile.seekg(0, datFile.end);
int length = datFile.tellg();
datFile.seekg(0, datFile.beg);
cout << "readFile() file " << setw(70) << aFileName << " is: " << setw(15) << length << " long\n";
// Allocate memory for buffer:
char * buffer = new char[length];
// Read data as one block:
datFile.read(buffer, length);
datFile.close();
/// Looking at the start of the buffer, I should be seeing "MY_LABEL"?
cout << "buffer: " << buffer << " " << *(buffer) << endl;
int* mSSX = reinterpret_cast<int*>(*(buffer+8));
int* mSSY = reinterpret_cast<int*>(&buffer+9);
int* mSSZ = reinterpret_cast<int*>(&buffer+10);
int* mSSROT = reinterpret_cast<int*>(&buffer+11);
int* mTimer = reinterpret_cast<int*>(&buffer+12);
int* mGALX = reinterpret_cast<int*>(&buffer+13);
int* mGALY = reinterpret_cast<int*>(&buffer+14);
int* mGALZ = reinterpret_cast<int*>(&buffer+15);
int* mGALROT = reinterpret_cast<int*>(&buffer+16);
int* filePrefixLength = reinterpret_cast<int*>(&buffer+17);
std::string filePrefix; std::string dataTimeStamp;
// Read file prefix character by character into stringstream object
std::stringstream ss;
char* cPointer = (char *)(buffer+18);
int k;
for(k = 0; k < *filePrefixLength; k++)
{
//read string
char c;
c = *cPointer;
ss << c;
cPointer++;
}
filePrefix = ss.str();
// Read timestamp character by character into stringstream object
std::stringstream timeStampStream;
/// Need not increment cPointer, already pointing @ 1st char of timeStamp
for (int l= 0; l < 13; l++)
{
char c;
c = * cPointer;
timeStampStream << c;
}
dataTimeStamp = timeStampStream.str();
cout << 25 << endl;
cout << " mSSX: " << mSSX << " mSSY: " << mSSY << " mSSZ: " << mSSZ;
cout << " mSSROT: " << mSSROT << " mTimer: " << mTimer << " mGALX: " << mGALX;
cout << " mGALY: " << mGALY << " mGALZ: " << mGALZ << " mGALROT: " << mGALROT;
最后,我看到的是下面这个。我添加 25 只是为了仔细检查并非所有内容都是十六进制的。如您所见,我能够按预期看到标签 "MY_LABEL"。但是 9 个 motorPositions 都看起来可疑地看起来地址不是值。文件前缀和数据时间戳(应该是字符串,或者至少是字符)都是空的。
buffer: MY_LABEL M
25
mSSX: 0000000000000003 mSSY: 00000000001BF618 mSSZ: 00000000001BF620 mSSROT: 00000000001BF628 mTimer: 00000000001BF630 mGALX: 00000000001BF638 mGALY: 00000000001BF640 mGALZ: 00000000001BF648 mGALROT: 00000000001BF650filePrefix: dataTimeStamp:
我确定解决方案不会太复杂,但我到了一个阶段,我只是在旋转,无法理解事情。
非常感谢阅读这篇有点长的文章post。
-- 编辑--
我可能会达到 post 允许的最大长度,但以防万一我认为我应该 post 生成我正在尝试读回的数据的代码:
bool writePixelOutput(string aOutputPixelFileName) {
// Write pixel histograms out to binary file
ofstream pixelFile;
pixelFile.open(aOutputPixelFileName.c_str(), ios::binary | ios::out | ios::trunc);
if (!pixelFile.is_open()) {
LOG(gLogConfig, logERROR) << "Failed to open output file " << aOutputPixelFileName;
return false;
}
// Write binary file header
string label("MY_LABEL");
pixelFile.write(label.c_str(), label.length());
pixelFile.write((const char*)&mFormatVersion, sizeof(u64));
// Include File Prefix/Motor Positions/Data Time Stamp - if format version > 1
if (mFormatVersion > 1)
{
pixelFile.write((const char*)&mSSX, sizeof(mSSX));
pixelFile.write((const char*)&mSSY, sizeof(mSSY));
pixelFile.write((const char*)&mSSZ, sizeof(mSSZ));
pixelFile.write((const char*)&mSSROT, sizeof(mSSROT));
pixelFile.write((const char*)&mTimer, sizeof(mTimer));
pixelFile.write((const char*)&mGALX, sizeof(mGALX));
pixelFile.write((const char*)&mGALY, sizeof(mGALY));
pixelFile.write((const char*)&mGALZ, sizeof(mGALZ));
pixelFile.write((const char*)&mGALROT, sizeof(mGALROT));
// Determine length of mFilePrefix string
int filePrefixSize = (int)mFilePrefix.size();
// Write prefix length, followed by prefix itself
pixelFile.write((const char*)&filePrefixSize, sizeof(filePrefixSize));
size_t prefixLen = 0;
if (mFormatVersion == 2) prefixLen = mFilePrefix.size();
else prefixLen = 100;
pixelFile.write(mFilePrefix.c_str(), prefixLen);
pixelFile.write(mDataTimeStamp.c_str(), mDataTimeStamp.size());
}
// Continue writing header information that is common to both format versions
pixelFile.write((const char*)&mRows, sizeof(mRows));
pixelFile.write((const char*)&mCols, sizeof(mCols));
pixelFile.write((const char*)&mHistoBins, sizeof(mHistoBins));
// Write the actual data - taken out for briefy sake
// ..
pixelFile.close();
LOG(gLogConfig, logINFO) << "Written output histogram binary file " << aOutputPixelFileName;
return true;
}
-- 编辑 2 (11:32 09/12/2015) --
感谢您的所有帮助,我现在离解决问题更近了。根据 muelleth 的回答,我尝试:
/// Read into char buffer
char * buffer = new char[length];
datFile.read(buffer, length);// length determined by ifstream.seekg()
/// Let's try HxtBuffer
HxtBuffer *input = new HxtBuffer;
cout << "sizeof HxtBuffer: " << sizeof *input << endl;
memcpy(input, buffer, length);
然后我可以显示不同的结构变量:
qDebug() << "Slice BUFFER label " << QString::fromStdString(input->hxtLabel);
qDebug() << "Slice BUFFER version " << QString::number(input->hxtVersion);
qDebug() << "Slice BUFFER hxtPrefixLength " << QString::number(input->filePrefixLength);
for (int i = 0; i < 9; i++)
{
qDebug() << i << QString::number(input->motorPositions[i]);
}
qDebug() << "Slice BUFFER filePrefix " << QString::fromStdString(input->filePrefix);
qDebug() << "Slice BUFFER dataTimeStamp " << QString::fromStdString(input->dataTimeStamp);
qDebug() << "Slice BUFFER nRows " << QString::number(input->nRows);
qDebug() << "Slice BUFFER nCols " << QString::number(input->nCols);
qDebug() << "Slice BUFFER nBins " << QString::number(input->nBins);
然后输出基本符合预期:
Slice BUFFER label "MY_LABEL"
Slice BUFFER version "3"
Slice BUFFER hxtPrefixLength "2"
0 "2109"
1 "5438"
...
7 "1055"
8 "1066"
Slice BUFFER filePrefix "-1"
Slice BUFFER dataTimeStamp "000000_000000P"
Slice BUFFER nRows "20480"
Slice BUFFER nCols "256000"
Slice BUFFER nBins "0"
EXCEPT,dataTimeStamp
,长度为 13 个字符,显示为 14 个字符。随后的 3 个变量:nRows
、nCols
和 nBins
则不正确。 (应该是 nRows=80,nCols=80,nBins=1000)。我的猜测是属于 dataTimeStamp
的第 14 个字符的位应该与 nRows
一起读取,因此级联以产生正确的 nCols
和 nBins
。
我已经使用 qDebug 单独验证(此处未显示)我写入文件的内容确实是我期望的值及其各自的大小。
我个人会尝试从文件中准确读取结构的字节数,即
int length = sizeof(HxtBuffer);
然后简单地使用 memcpy 从读取缓冲区分配一个本地结构:
HxtBuffer input;
memcpy(&input, buffer, length);
然后您可以访问您的数据,例如喜欢:
std::cout << "Data: " << input.hxtLabel << std::endl;
为什么读到缓冲,而不是使用结构体读?
HxtBuffer data;
datFile.read(reinterpret_cast<char *>(&data), sizeof data);
if(datFile && datFile.gcount()!=sizeof data)
throw io_exception();
// Can use data.
如果您想读取字符缓冲区,那么您获取数据的方式就是错误的。你可能想做这样的事情。
char *buf_offset=buffer+8+sizeof(u64); // Skip label (8 chars) and version (int64)
int mSSX = *reinterpret_cast<int*>(buf_offset);
buf_offset+=sizeof(int);
int mSSY = *reinterpret_cast<int*>(buf_offset);
buf_offset+=sizeof(int);
int mSSZ = *reinterpret_cast<int*>(buf_offset);
/* etc. */
或者,更好一点(前提是您不更改缓冲区的内容)。
int *ptr_motors=reinterpret_cast<int *>(buffer+8+sizeof(u64));
int &mSSX = ptr_motors[0];
int &mSSY = ptr_motors[1];
int &mSSZ = ptr_motors[2];
/* etc. */
请注意,我没有将 mSSX
、mSSY
等声明为指针。您的代码将它们打印为地址,因为 you 告诉编译器它们是地址(指针)。