为什么在用 O_DIRECT 写入文件时从文件读回时数据损坏
Why is data corrupt when reading back from a file as it's being written with O_DIRECT
我有一个 C++ 程序,它使用 POSIX API 编写一个用 O_DIRECT
打开的文件。同时,另一个线程正在通过不同的文件描述符从同一个文件中读回。我注意到偶尔从文件中读回的数据包含全零,而不是我写入的实际数据。这是为什么?
这是 C++17 中的 MCVE。使用 g++ -std=c++17 -Wall -otest test.cpp
或等效项进行编译。抱歉,我似乎无法缩短它。它所做的只是在一个线程中将 100 MiB 的常量字节 (0x5A) 写入文件,然后在另一个线程中将它们读回,如果任何读回字节不等于 0x5A,则打印一条消息。
WARNING, this MCVE will delete and rewrite any file in the current working directory named foo
.
#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <cstdlib>
#include <iostream>
#include <thread>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
constexpr size_t CHUNK_SIZE = 1024 * 1024;
constexpr size_t TOTAL_SIZE = 100 * CHUNK_SIZE;
int main(int argc, char *argv[])
{
::unlink("foo");
std::thread write_thread([]()
{
int fd = ::open("foo", O_WRONLY | O_CREAT | O_DIRECT, 0777);
if (fd < 0) std::exit(-1);
uint8_t *buffer = static_cast<uint8_t *>(
std::aligned_alloc(4096, CHUNK_SIZE));
std::fill(buffer, buffer + CHUNK_SIZE, 0x5A);
size_t written = 0;
while (written < TOTAL_SIZE)
{
ssize_t rv = ::write(fd, buffer,
std::min(TOTAL_SIZE - written, CHUNK_SIZE));
if (rv < 0) { std::cerr << "write error" << std::endl; std::exit(-1); }
written += rv;
}
});
std::thread read_thread([]()
{
int fd = ::open("foo", O_RDONLY, 0);
if (fd < 0) std::exit(-1);
uint8_t *buffer = new uint8_t[CHUNK_SIZE];
size_t checked = 0;
while (checked < TOTAL_SIZE)
{
ssize_t rv = ::read(fd, buffer, CHUNK_SIZE);
if (rv < 0) { std::cerr << "write error" << std::endl; std::exit(-1); }
for (ssize_t i = 0; i < rv; ++i)
if (buffer[i] != 0x5A)
std::cerr << "readback mismatch at offset " << checked + i << std::endl;
checked += rv;
}
});
write_thread.join();
read_thread.join();
}
(为了 MCVE,这里省略了正确的错误检查和资源管理等细节。这不是我的实际程序,但它显示了相同的行为。)
我正在使用 SSD 在 Linux 4.15.0 上进行测试。大约 1/3 的时间我 运行 程序打印 "readback mismatch" 消息。有时它没有。在所有情况下,如果我事后检查 foo
,我发现它确实包含正确的数据。
如果您从写入线程的 ::open()
标志中删除 O_DIRECT
,问题就会消失,并且永远不会打印 "readback mismatch" 消息。
我能理解为什么我的 ::read()
可能 return 0 或其他表示我已经读取已刷新到磁盘的所有内容。但我不明白为什么它会执行看似成功的读取,但使用的数据不是我写的。显然我遗漏了什么,但它是什么?
因此,O_DIRECT
has some additional constraints 可能无法满足您的需求:
Applications should avoid mixing O_DIRECT
and normal I/O to the same
file, and especially to overlapping byte regions in the same file.
Even when the filesystem correctly handles the coherency issues in
this situation, overall I/O throughput is likely to be slower than
using either mode alone.
相反,我认为 O_SYNC
可能更好,因为它确实提供了预期的保证:
O_SYNC
provides synchronized I/O file integrity completion, meaning
write operations will flush data and all associated metadata to the
underlying hardware. O_DSYNC
provides synchronized I/O data
integrity completion, meaning write operations will flush data to the
underlying hardware, but will only flush metadata updates that are
required to allow a subsequent read operation to complete
successfully. Data integrity completion can reduce the number of
disk operations that are required for applications that don't need
the guarantees of file integrity completion.
我有一个 C++ 程序,它使用 POSIX API 编写一个用 O_DIRECT
打开的文件。同时,另一个线程正在通过不同的文件描述符从同一个文件中读回。我注意到偶尔从文件中读回的数据包含全零,而不是我写入的实际数据。这是为什么?
这是 C++17 中的 MCVE。使用 g++ -std=c++17 -Wall -otest test.cpp
或等效项进行编译。抱歉,我似乎无法缩短它。它所做的只是在一个线程中将 100 MiB 的常量字节 (0x5A) 写入文件,然后在另一个线程中将它们读回,如果任何读回字节不等于 0x5A,则打印一条消息。
WARNING, this MCVE will delete and rewrite any file in the current working directory named
foo
.
#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <cstdlib>
#include <iostream>
#include <thread>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
constexpr size_t CHUNK_SIZE = 1024 * 1024;
constexpr size_t TOTAL_SIZE = 100 * CHUNK_SIZE;
int main(int argc, char *argv[])
{
::unlink("foo");
std::thread write_thread([]()
{
int fd = ::open("foo", O_WRONLY | O_CREAT | O_DIRECT, 0777);
if (fd < 0) std::exit(-1);
uint8_t *buffer = static_cast<uint8_t *>(
std::aligned_alloc(4096, CHUNK_SIZE));
std::fill(buffer, buffer + CHUNK_SIZE, 0x5A);
size_t written = 0;
while (written < TOTAL_SIZE)
{
ssize_t rv = ::write(fd, buffer,
std::min(TOTAL_SIZE - written, CHUNK_SIZE));
if (rv < 0) { std::cerr << "write error" << std::endl; std::exit(-1); }
written += rv;
}
});
std::thread read_thread([]()
{
int fd = ::open("foo", O_RDONLY, 0);
if (fd < 0) std::exit(-1);
uint8_t *buffer = new uint8_t[CHUNK_SIZE];
size_t checked = 0;
while (checked < TOTAL_SIZE)
{
ssize_t rv = ::read(fd, buffer, CHUNK_SIZE);
if (rv < 0) { std::cerr << "write error" << std::endl; std::exit(-1); }
for (ssize_t i = 0; i < rv; ++i)
if (buffer[i] != 0x5A)
std::cerr << "readback mismatch at offset " << checked + i << std::endl;
checked += rv;
}
});
write_thread.join();
read_thread.join();
}
(为了 MCVE,这里省略了正确的错误检查和资源管理等细节。这不是我的实际程序,但它显示了相同的行为。)
我正在使用 SSD 在 Linux 4.15.0 上进行测试。大约 1/3 的时间我 运行 程序打印 "readback mismatch" 消息。有时它没有。在所有情况下,如果我事后检查 foo
,我发现它确实包含正确的数据。
如果您从写入线程的 ::open()
标志中删除 O_DIRECT
,问题就会消失,并且永远不会打印 "readback mismatch" 消息。
我能理解为什么我的 ::read()
可能 return 0 或其他表示我已经读取已刷新到磁盘的所有内容。但我不明白为什么它会执行看似成功的读取,但使用的数据不是我写的。显然我遗漏了什么,但它是什么?
因此,O_DIRECT
has some additional constraints 可能无法满足您的需求:
Applications should avoid mixing
O_DIRECT
and normal I/O to the same file, and especially to overlapping byte regions in the same file. Even when the filesystem correctly handles the coherency issues in this situation, overall I/O throughput is likely to be slower than using either mode alone.
相反,我认为 O_SYNC
可能更好,因为它确实提供了预期的保证:
O_SYNC
provides synchronized I/O file integrity completion, meaning write operations will flush data and all associated metadata to the underlying hardware.O_DSYNC
provides synchronized I/O data integrity completion, meaning write operations will flush data to the underlying hardware, but will only flush metadata updates that are required to allow a subsequent read operation to complete successfully. Data integrity completion can reduce the number of disk operations that are required for applications that don't need the guarantees of file integrity completion.