Cassandra

Question

有时候；当我执行删除时；没用。

我的配置：[cqlsh 5.0.1 |卡桑德拉 3.0.3 | CQL 规范 3.4.0 |本机协议 v4]

cqlsh:my_db> SELECT * FROM conversations  WHERE user_id=120 AND conversation_id=2 AND peer_type=1;

user_id | conversation_id | peer_type | message_map
---------+-----------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 120 |               2 |         1 | {0: {real_id: 68438, date: 1455453523, sent: True}, 1: {real_id: 68437, date: 1455453520, sent: True}, 2: {real_id: 68436, date: 1455453517, sent: True}, 3: {real_id: 68435, date: 1455453501, sent: True}, 4: {real_id: 68434, date: 1455453500, sent: True}, 5: {real_id: 68433, date: 1455453499, sent: True}, 6: {real_id: 68432, date: 1455453498, sent: True}, 7: {real_id: 68431, date: 1455453494, sent: True}, 8: {real_id: 68430, date: 1455453480, sent: True}}

(1 rows)
cqlsh:my_db> DELETE message_map FROM conversations WHERE user_id=120 AND conversation_id=2 AND peer_type=1;
cqlsh:my_db> SELECT * FROM conversations  WHERE user_id=120 AND conversation_id=2 AND peer_type=1;

user_id | conversation_id | peer_type | message_map
---------+-----------------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 120 |               2 |         1 | {0: {real_id: 68438, date: 1455453523, sent: True}, 1: {real_id: 68437, date: 1455453520, sent: True}, 2: {real_id: 68436, date: 1455453517, sent: True}, 3: {real_id: 68435, date: 1455453501, sent: True}, 4: {real_id: 68434, date: 1455453500, sent: True}, 5: {real_id: 68433, date: 1455453499, sent: True}, 6: {real_id: 68432, date: 1455453498, sent: True}, 7: {real_id: 68431, date: 1455453494, sent: True}, 8: {real_id: 68430, date: 1455453480, sent: True}}

(1 rows)

CQLSH 没有 return 我在 DELETE 指令上有任何错误，但好像没有考虑到它一样。

你知道为什么吗？

注意：这是我的 table 定义：

CREATE TABLE be_telegram.conversations (
user_id bigint,
conversation_id int,
peer_type int,
message_map map<int, frozen<message>>,
PRIMARY KEY (user_id, conversation_id, peer_type)
) WITH CLUSTERING ORDER BY (conversation_id ASC, peer_type ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

Answer 1

DELETE 语句从 table 中的一行或多行中删除一列或多列，或者如果未指定列，则删除整行。 Cassandra 在同一个 partition key 中以原子方式独立应用选择。

删除列时，它不会立即从磁盘中删除。已删除的列标有 tombstone，然后在配置的宽限期到期后删除。可选的 timestamp 定义新的 tombstone 记录。

关于 Cassandra

中的删除

Cassandra 删除数据的方式不同于关系数据库删除数据的方式。关系数据库可能会花时间扫描数据以查找过期数据并将其丢弃，或者管理员可能必须按月对过期数据进行分区，例如，以便更快地将其清除。 Cassandra 列中的数据可以有一个可选的到期日期，称为 TTL（生存时间）。

关于已删除数据的事实要记住：

Cassandra 不会立即从中删除标记为删除的数据磁盘。删除发生在压缩过程中。
如果你使用sized-tiered or date-tiered压缩策略，你可以通过手动开始压缩来立即删除数据进程。在这样做之前，了解记录在案的缺点过程。
如果您不这样做，已删除的列可以重新出现运行 node repair 例行公事。

为什么删除的数据会重新出现

Marking data with a tombstone signals Cassandra to retry sending a delete request to a replica that was down at the time of delete. If the replica comes back up within the grace period of time, it eventually receives the delete request. However, if a node is down longer than the grace period, the node can miss the delete because the tombstone disappears after gc_grace_seconds. Cassandra always attempts to replay missed updates when the node comes back up again. After a failure, it is a best practice to run node repair to repair inconsistencies across all of the replicas when bringing a node back into the cluster. If the node doesn't come back within gc_grace,_seconds, remove the node, wipe it, and bootstrap it again.

在你的例子中，compaction 是 sized-tiered。所以请尝试压缩过程。

Compaction

Periodic compaction is essential to a healthy Cassandra database because Cassandra does not insert/update in place. As inserts/updates occur, instead of overwriting the rows, Cassandra writes a new timestamped version of the inserted or updated data in another SSTable. Cassandra manages the accumulation of SSTables on disk using compaction.

Cassandra also does not delete in place because the SSTable is immutable. Instead, Cassandra marks data to be deleted using a tombstone. Tombstones exist for a configured time period defined by the gc_grace_seconds value set on the table. During compaction, there is a temporary spike in disk space usage and disk I/O because the old and new SSTables co-exist. This diagram depicts the compaction process:

Compaction merges the data in each SSTable data by partition key, selecting the latest data for storage based on its timestamp. Cassandra can merge the data performantly, without random IO, because rows are sorted by partition key within each SSTable. After evicting tombstones and removing deleted data, columns, and rows, the compaction process consolidates SSTables into a single file. The old SSTable files are deleted as soon as any pending reads finish using the files. Disk space occupied by old SSTables becomes available for reuse.

Data input to SSTables is sorted to prevent random I/O during SSTable consolidation. After compaction, Cassandra uses the new consolidated SSTable instead of multiple old SSTables, fulfilling read requests more efficiently than before compaction. The old SSTable files are deleted as soon as any pending reads finish using the files. Disk space occupied by old SSTables becomes available for reuse.

所以试试这个

nodetool <options> repair

options are:
( -h | --host ) <host name> | <ip address>
( -p | --port ) <port number>
( -pw | --password ) <password >
( -u | --username ) <user name>
-- Separates an option and argument that could be mistaken for a option.
keyspace is the name of a keyspace.
table is one or more table names, separated by a space.

此命令在使用 SizeTieredCompactionStrategy 或 DateTieredCompactionStrategy 的 table 上启动压缩过程。您可以指定一个 keyspace 用于压缩。如果不指定 keyspace，nodetool 命令将使用 current keyspace。您可以为 compaction 指定一个或多个 table。如果您不指定 table(s)，则会压缩键 space 中的所有 table。这称为 主要压缩 。如果您确实指定了 table(s)，则会对指定的 table(s) 进行压缩。这称为 小型压缩 。主要压缩将所有现有的 SSTable 合并到一个 SSTable 中。在压缩过程中，磁盘 space 和磁盘 I/O 的使用会出现临时峰值，因为新旧 SSTables 共存。主要压缩会导致相当大的磁盘 I/O.

Cassandra - 删除不工作

Cassandra - Delete not working

cqlsh

关于 Cassandra

关于已删除数据的事实要记住：

为什么删除的数据会重新出现

Compaction