Select Count(*) 可以影响 Cassandra 中的写入吗
Can a Select Count(*) Affect Writes in Cassandra
我遇到过这样一种情况,即每分钟 table 上的 select count(*)(是的,这绝对应该避免)导致 Cassandra 写入量大幅增加,达到每分钟约 150K 次写入第二。
谁能解释一下这种奇怪的行为?为什么 Select 查询会显着增加 Cassandra 中的写入计数?
谢谢!
如果你勾选
org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBackground
和
org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBlocking
您可以查看其读取修复发送突变的指标。如果您的数据不一致,可能读取所有数据来为 count(*) 服务会导致大量读取修复。如果是这样的话,降低 table (ALTER TABLE
) 上的 read_repair_chance
和 dclocal_read_repair_chance
可以减少负载。
其他可能的可能性是:
- 您启用了跟踪功能(全局或在 table 上)作为某些 %.
- 或者,如果您使用 DSE 并且启用了慢速查询。
可以在 the write path of an update:
中找到可能的解释
During a write , Cassandra adds each new row to the database without checking on whether a duplicate record exists. This policy makes it possible that many versions of the same row may exist in the database.
然后
Most Cassandra installations store replicas of each row on two or more nodes. Each node performs compaction independently. This means that even though out-of-date versions of a row have been dropped from one node, they may still exist on another node.
最后:
This is why Cassandra performs another round of comparisons during a read process. When a client requests data with a particular primary key, Cassandra retrieves many versions of the row from one or more replicas.
我遇到过这样一种情况,即每分钟 table 上的 select count(*)(是的,这绝对应该避免)导致 Cassandra 写入量大幅增加,达到每分钟约 150K 次写入第二。
谁能解释一下这种奇怪的行为?为什么 Select 查询会显着增加 Cassandra 中的写入计数?
谢谢!
如果你勾选
org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBackground
和
org.apache.cassandra.metrics:type=ReadRepair,name=RepairedBlocking
您可以查看其读取修复发送突变的指标。如果您的数据不一致,可能读取所有数据来为 count(*) 服务会导致大量读取修复。如果是这样的话,降低 table (ALTER TABLE
) 上的 read_repair_chance
和 dclocal_read_repair_chance
可以减少负载。
其他可能的可能性是:
- 您启用了跟踪功能(全局或在 table 上)作为某些 %.
- 或者,如果您使用 DSE 并且启用了慢速查询。
可以在 the write path of an update:
中找到可能的解释During a write , Cassandra adds each new row to the database without checking on whether a duplicate record exists. This policy makes it possible that many versions of the same row may exist in the database.
然后
Most Cassandra installations store replicas of each row on two or more nodes. Each node performs compaction independently. This means that even though out-of-date versions of a row have been dropped from one node, they may still exist on another node.
最后:
This is why Cassandra performs another round of comparisons during a read process. When a client requests data with a particular primary key, Cassandra retrieves many versions of the row from one or more replicas.