基于 orderedBy from-to 同步处理 2 个表时的性能问题

Question

标题可能不是很清楚，让我解释一下。

我想在 2 tables*、Session 和 SessionAction 上处理 in-process 连接 (nodeJs)。 (1-N)

因为这些 table 相当大（都有数百万条记录），我的想法是根据 orderBy sessionId（它们共享）获取切片，并且有点 lock-step 分批遍历两个 table。

然而事实证明这非常慢。我对 tables 使用如下伪代码来获取批次：

table('x').orderBy({index:"sessionId"}.filter(row.sessionId > start && row.sessionId < y)

似乎即使我本质上是在具有索引的属性 sessionId 上进行过滤，查询计划器也不够聪明，无法看到这一点，每个查询都会执行完整的 table在之后过滤之前扫描以执行 orderby（或者看起来如此）

当然，这非常浪费，但我看不到其他选择。例如：

Rethink 不支持过滤后排序。
获取有序 table 的一部分也不起作用，因为 slice-enumeration（即：第 x 到第 y 条记录）由于缺少更好的工作而不会在两者之间加起来2 table 秒。

问题：

*) 仅使用 Rethink Reql 太复杂了。

Answer 1

filter 从未在 RethinkDB 中建立索引。（通常，如果您将 index 作为其可选参数之一传递，则特定命令将仅使用二级索引。）您可以像这样编写该查询以避免扫描整个 table:

r.table('x').orderBy({index: 'sessionID'}).between(start, y, {index: 'sessionId'})

performance issues while processing 2 tables in lockstep based on orderedBy from-to