为什么在用 O_DIRECT 写入文件时从文件读回时数据损坏

Why is data corrupt when reading back from a file as it's being written with O_DIRECT

我有一个 C++ 程序,它使用 POSIX API 编写一个用 O_DIRECT 打开的文件。同时,另一个线程正在通过不同的文件描述符从同一个文件中读回。我注意到偶尔从文件中读回的数据包含全零,而不是我写入的实际数据。这是为什么?

这是 C++17 中的 MCVE。使用 g++ -std=c++17 -Wall -otest test.cpp 或等效项进行编译。抱歉,我似乎无法缩短它。它所做的只是在一个线程中将 100 MiB 的常量字节 (0x5A) 写入文件,然后在另一个线程中将它们读回,如果任何读回字节不等于 0x5A,则打印一条消息。

WARNING, this MCVE will delete and rewrite any file in the current working directory named foo.

#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <cstdlib>
#include <iostream>
#include <thread>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>

constexpr size_t CHUNK_SIZE = 1024 * 1024;
constexpr size_t TOTAL_SIZE = 100 * CHUNK_SIZE;

int main(int argc, char *argv[])
{
    ::unlink("foo");

    std::thread write_thread([]()
    {
        int fd = ::open("foo", O_WRONLY | O_CREAT | O_DIRECT, 0777);
        if (fd < 0) std::exit(-1);

        uint8_t *buffer = static_cast<uint8_t *>(
            std::aligned_alloc(4096, CHUNK_SIZE));

        std::fill(buffer, buffer + CHUNK_SIZE, 0x5A);

        size_t written = 0;
        while (written < TOTAL_SIZE)
        {
            ssize_t rv = ::write(fd, buffer,
                std::min(TOTAL_SIZE - written, CHUNK_SIZE));
            if (rv < 0) { std::cerr << "write error" << std::endl; std::exit(-1); }
            written += rv;
        }
    });

    std::thread read_thread([]()
    {
        int fd = ::open("foo", O_RDONLY, 0);
        if (fd < 0) std::exit(-1);

        uint8_t *buffer = new uint8_t[CHUNK_SIZE];

        size_t checked = 0;
        while (checked < TOTAL_SIZE)
        {
            ssize_t rv = ::read(fd, buffer, CHUNK_SIZE);
            if (rv < 0) { std::cerr << "write error" << std::endl; std::exit(-1); }

            for (ssize_t i = 0; i < rv; ++i)
                if (buffer[i] != 0x5A)
                    std::cerr << "readback mismatch at offset " << checked + i << std::endl;

            checked += rv;
        }
    });

    write_thread.join();
    read_thread.join();
}

(为了 MCVE,这里省略了正确的错误检查和资源管理等细节。这不是我的实际程序,但它显示了相同的行为。)

我正在使用 SSD 在 Linux 4.15.0 上进行测试。大约 1/3 的时间我 运行 程序打印 "readback mismatch" 消息。有时它没有。在所有情况下,如果我事后检查 foo,我发现它确实包含正确的数据。

如果您从写入线程的 ::open() 标志中删除 O_DIRECT,问题就会消失,并且永远不会打印 "readback mismatch" 消息。

我能理解为什么我的 ::read() 可能 return 0 或其他表示我已经读取已刷新到磁盘的所有内容。但我不明白为什么它会执行看似成功的读取,但使用的数据不是我写的。显然我遗漏了什么,但它是什么?

因此,O_DIRECT has some additional constraints 可能无法满足您的需求:

Applications should avoid mixing O_DIRECT and normal I/O to the same file, and especially to overlapping byte regions in the same file. Even when the filesystem correctly handles the coherency issues in this situation, overall I/O throughput is likely to be slower than using either mode alone.

相反,我认为 O_SYNC 可能更好,因为它确实提供了预期的保证:

O_SYNC provides synchronized I/O file integrity completion, meaning write operations will flush data and all associated metadata to the underlying hardware. O_DSYNC provides synchronized I/O data integrity completion, meaning write operations will flush data to the underlying hardware, but will only flush metadata updates that are required to allow a subsequent read operation to complete successfully. Data integrity completion can reduce the number of disk operations that are required for applications that don't need the guarantees of file integrity completion.