在 torrent kademlia 路由上实现查找节点 table

Question

我已经审阅了很多关于这个主题的文档，但有些地方不是很清楚。例如比特流文件 (http://www.bittorrent.org/beps/bep_0005.html) 状态

The routing table is subdivided into "buckets" that each cover a portion of the space. An empty table has one bucket with an ID space range of min=0, max=2^160. When a node with ID "N" is inserted into the table, it is placed within the bucket that has min <= N < max. An empty table has only one bucket so any node must fit within it. Each bucket can only hold K nodes, currently eight, before becoming "full." When a bucket is full of known good nodes, no more nodes may be added unless our own node ID falls within the range of the bucket. In that case, the bucket is replaced by two new buckets each with half the range of the old bucket and the nodes from the old bucket are distributed among the two new ones. For a new table with only one bucket, the full bucket is always split into two new buckets covering the ranges 0..2^159 and 2^159..2^160.

它与其他关于 kademlia 路由 table 的文档有些不同，其中根据节点 ID 的位前缀运行存储桶，但还有另一件令人困惑的事情。当我们回复 "find node" 请求时，我们必须使用 XOR 运算找到与请求的节点最近的 8 个节点。我看到一些实现只是遍历路由 table 中的每个项目执行 XOR 操作，从而找到 8 个最接近的项目。在我看来也是CPU浪费。

一切都在桶里。即使我们使用 bit torrent 文档系统的建议，我们也可以更快地找到可能包含请求的节点 ID 的桶，只需枚举桶并检查其上的最小和最大数量。然后可能那个桶应该包含关闭节点，但它们是值最近的节点而不是 XOR 最近的节点（据我所知）这有点不同但有点相似。

我运行一个简单的测试，使用 0 到 99 之间的数字，我想找到 8 个 XOR 最接近的数字，它们在所寻找的数字附近但不在它附近。现在，考虑我们的存储桶，我猜存储桶中的所有节点 ID 可能都是最接近小异常的。因此，例如，如果我们拿这个桶，从左边拿一个，从右边拿一个，然后搜索 XOR 最近的节点 ID，我们会找到我们要找的东西，并且没有必要遍历路由中的所有节点table.

我是对的还是我漏掉了什么？

Answer 1

It is somewhat different from other documents regarding kademlia routing table where buckets are arranged in accordance to the bit prefix of the node id but there is another confusing thing.

bittorrent 规范描述了一种路由 table 实现，它仅近似于 kademlia paper 中描述的实现。实现起来比较容易，但也有一些不足。

So, for example if we take this bucket, take one from the left and one from the right and search for the XOR closest node ids we'll find what we are looking for and there is no point to go through ALL nodes in the routing table.

在两者中 - 完整的树状路由 table 实现和简化的 BEP5 变体 - 每个桶都可以被认为有一个 CIDR-like prefix (see ) 由桶的前缀位组成覆盖和掩码位计数。

在 BEP5 变体中，每个桶的前缀只是从数组索引和您的节点自己的 ID 派生而来。在树状 table 桶中，由于桶 split/merge 操作，桶必须跟踪它们的前缀。

使用这些前缀，您可以找到包含目标键的存储桶。

问题是桶不必装满，如果要发送，假设一个响应中有 20 个节点，一个桶是不够的。

所以你必须遍历路由table（根据你自己的节点ID或自然距离排序）以升序（异或）顺序相对于目标键访问多个桶。

由于 XOR 距离度量在每个进位处折叠（XOR == 无进位加法），因此它不能很好地映射到任何路由 table 布局。换句话说，访问最近的桶是行不通的。

Here's my implementation 用于从树状路由 table 中找到距特定目标键最近的 N 个节点。

我想很多人只是简单地遍历整个路由 table 因为对于常规节点，它最多只包含几十个桶，而 DHT 节点看不到太多流量，所以它只需要每秒执行几次这个操作，如果你在一个密集的、缓存友好的数据结构中实现这个，那么最大的份额实际上可能是内存流量，而不是 CPU 执行一些异或和比较的指令。

即全table扫描很容易实现。

假设我们有一个路由 table，每个桶都有以下位前缀。这些字母用作方便的名称）。

A 000... 
B 00100... 
C 0010100... 
D 0010101000... 
E 0010101001...
F 001010101... 
G 00101011... 
H 001011... 
I 0011... 
J 01... 
K 1...

现在假设我们正在寻找这个目标键：

T = 0010011010111111001111100101011000001010010100001100100010011100000000100100000111001101100110110110101100010100111010111001101111110101001001001000100000001001

另外，桶没有完全满，或者我们需要比单个桶所能容纳的条目更多的条目，因此我们必须访问多个桶才能获得所需的数量。

现在，访问的第一个存储桶非常明显，它是 B，因为它的前缀覆盖了目标键。

由于 B 的前缀长度为 5 位，因此该桶中的任何条目都将与 T 的 00000???????... 具有异或距离。共享 5 个前缀位。

B 是最接近 T 的桶，这意味着不能有任何路由 table 条目比相对距离 00000... 更近。相反，这意味着 B 之外的任何条目可以具有的最小距离是 00001...。这意味着下一个最近的桶必须覆盖 T xor 00001... -> 00101110101111110[...].

覆盖这个的桶是 H。

H 与 B

不相邻

最终 - 假设我们必须访问 6 个桶 - 顺序将如下所示：

00100...      -> B
001011...     -> H
001010101...  -> F
0010101000... -> D
0010101001... -> E
00101011...   -> G

这看起来相当混乱。但是，如果我们绘制每个访问的桶的前缀到目标键的距离，它就会变得更加明显：

00000...
000010...
000011000...
0000110010...
0000110011...
00001101...

所以算法如下：

找到覆盖目标键的初始桶
将存储桶的前缀与目标键（零掩码尾随位）进行异或运算
将距离增加该前缀的最低有效位
与目标键异或增量距离
找到下一个包含异或密钥的桶
转到 2

TL;DR: "just look one bucket left, one bucket right" 是不够的。正确的算法相当复杂，对整个table进行线性扫描更容易实现。

在 torrent kademlia 路由上实现查找节点 table

Implementing find node on torrent kademlia routing table

routing

bittorrent

dht

kademlia