嵌入式键值数据库与每个键只存储一个文件？

Embedded key-value db vs. just storing one file per key?

我对嵌入式键值数据库相对于每个键只在磁盘上存储一个文件的天真的解决方案的优势感到困惑。例如，RocksDB、Badger、SQLite 等数据库使用 B+ 树和 LSM 等奇特的数据结构，但似乎获得与这个简单解决方案大致相同的性能。

例如，Badger（最快的 Go 嵌入式数据库）需要 about 800 microseconds 来编写一个条目。相比之下，从头开始创建一个新文件并向其中写入一些数据需要 150 个麦克风，并且没有进行优化。

编辑：澄清一下，这是我与最先进的嵌入式数据库进行比较的键值存储的简单实现。只需将每个键散列为一个字符串文件名，并将关联的值作为字节数组存储在该文件名中。读取和写入各约为 150 个麦克风，这对于单次操作比 Badger 更快，并且对于批处理操作而言具有可比性。此外，磁盘 space 是最小的，因为除了实际值之外我们不存储任何额外的结构。

我一定漏掉了一些东西，因为人们实际使用的解决方案非常花哨，并且使用布隆过滤器和 B+ 树等进行了优化。

但 Badger 并不是要写 "an" 条目：

My writes are really slow. Why?

Are you creating a new transaction for every single key update? This will lead to very low throughput.

To get best write performance, batch up multiple writes inside a transaction using single DB.Update() call.
You could also have multiple such DB.Update() calls being made concurrently from multiple goroutines.

这导致 issue 396:

I was looking for fast storage in Go and so my first try was BoltDB. I need a lot of single-write transactions. Bolt was able to do about 240 rq/s.

I just tested Badger and I got a crazy 10k rq/s. I am just baffled

那是因为：

LSM tree has an advantage compared to B+ tree when it comes to writes.
Also, values are stored separately in value log files so writes are much faster.

You can read more about the design here.

要点之一（很难用简单的 read/write 文件复制）是：

Key-Value separation

The major performance cost of LSM-trees is the compaction process. During compactions, multiple files are read into memory, sorted, and written back. Sorting is essential for efficient retrieval, for both key lookups and range iterations. With sorting, the key lookups would only require accessing at most one file per level (excluding level zero, where we’d need to check all the files). Iterations would result in sequential access to multiple files.

Each file is of fixed size, to enhance caching. Values tend to be larger than keys. When you store values along with the keys, the amount of data that needs to be compacted grows significantly.

In Badger, only a pointer to the value in the value log is stored alongside the key. Badger employs delta encoding for keys to reduce the effective size even further. Assuming 16 bytes per key and 16 bytes per value pointer, a single 64MB file can store two million key-value pairs.

您的问题假定唯一需要的操作是单个随机读取和写入。这些是 Badger 或 RocksDB 等日志结构合并 (LSM) 方法的最坏情况。范围查询，其中返回范围内的所有键或键值对，利用顺序读取（由于文件中排序的 kv 的邻接）以非常高的速度读取数据。对于 Badger，如果执行仅键查询或小值范围查询，您将主要受益，因为它们存储在 LSM 中，而大值附加在不必要排序的日志文件中。对于 RocksDB，您将获得快速的 kv 对范围查询。

之前的答案在某种程度上解决了写入的优势 - 使用缓冲。如果您编写许多 kv 对，而不是将每个 kv 对存储在单独的文件中，LSM 方法会将它们保存在内存中并最终在文件写入中刷新它们。天下没有免费的午餐，因此必须进行异步压缩以删除覆盖的数据并防止检查太多文件进行查询。

之前回答过 here。与此处提供的其他答案大体相似，但有一个重要的附加点：文件系统中的文件不能占用磁盘上的同一块。如果您的记录平均明显小于典型的磁盘块大小 (4-16 KiB)，将它们存储为单独的文件将产生大量存储开销。

嵌入式键值数据库与每个键只存储一个文件？

Embedded key-value db vs. just storing one file per key?

sqlite

go

embedded-database

rocksdb

My writes are really slow. Why?

Key-Value separation