有哪些选项可以加快 Cassandra 中的全面修复?

What options are there to speed up a full repair in Cassandra?

我有一个 Cassandra 数据中心,我想 运行 对其进行全面修复。数据中心用于 analytics/batch 处理,我愿意牺牲延迟来加快全面修复 (nodetool repair)。对数据中心的写入适中。

我有哪些选择可以加快完全修复速度?一些想法:

附加信息:

默认情况下,完整修复按顺序 运行。节点数据集的状态和差异存储在二叉树中。重新创建这些是这里的主要因素。根据this datastax blog entry、"Every time a repair is carried out, the tree has to be calculated, each node that is involved in the repair has to construct its merkle tree from all the sstables it stores making the calculation very expensive."

我认为显着提高 完整 修复速度的唯一方法是 运行 并行 或按子范围修复子范围。您的标签暗示您 运行 Cassandra 2.0.

1)并行全修复

 nodetool repair -par, or --parallel, means carry out a parallel repair.

根据the nodetool documentation for Cassandra 2.0

Unlike sequential repair (described above), parallel repair constructs the Merkle tables for all nodes at the same time. Therefore, no snapshots are required (or generated). Use a parallel repair to complete the repair quickly or when you have operational downtime that allows the resources to be completely consumed during the repair.

2)子区间修复 nodetool 像这样接受开始和结束令牌参数

 nodetool repair -st (start token) -et (end token) $keyspace $columnfamily

为简单起见,请查看此 python 脚本,该脚本可为您计算令牌并执行范围修复: https://github.com/BrianGallew/cassandra_range_repair

让我指出两个备选方案:

A) Jeff Jirsa 指出 增量修复

这些从 Cassandra 2.1 开始可用。您需要 perform certain migration steps 才能像这样使用 nodetool:

nodetool repair -inc, or --incremental means do an incremental repair.

B) OpsCenter 维修服务

我公司 itembase.com, we use the repair service in DataStax OpsCenter 的几个集群正在执行和管理小范围维修服务。