如果主键包含（timeuuid 和时间戳），则使用时间戳的 Cassandra 集群无法按预期工作

Question

我正在使用 Cassandra 2.1.5。

我正在创建 table 使用：

create table dummy2(  
  id timeuuid,  
  time timestamp,  
  primary key (id, time) 
) with clustering order by (time desc);

我在table中插入了4条记录：

insert into dummy2 (id, time) values (now(), 1000000);  
insert into dummy2 (id, time) values (now(), 2000000);  
insert into dummy2 (id, time) values (now(), 3000000);  
insert into dummy2 (id, time) values (now(), 4000000);

我得到结果：

 id                                   | time  
--------------------------------------+--------------------------  
 e1fa7a80-1e64-11e5-8bf5-55cdf06f740f | 1970-01-01 08:33:20+0800  
 e3bbb280-1e64-11e5-8bf5-55cdf06f740f | 1970-01-01 08:50:00+0800  
 e5ceb400-1e64-11e5-8bf5-55cdf06f740f | 1970-01-01 09:06:40+0800  
 e0719090-1e64-11e5-8bf5-55cdf06f740f | 1970-01-01 08:16:40+0800

这看起来像树图顺序，或者随机...

如果我将 id 类型从 "timeuuid" 更改为 "text"，那么排序工作正常：

 id    | time
-------+--------------------------
 hello | 1970-01-01 09:06:40+0800
 hello | 1970-01-01 08:50:00+0800
 hello | 1970-01-01 08:33:20+0800
 hello | 1970-01-01 08:16:40+0800

这是设计使然还是错误？还是我用错了？

Answer 1

是的，这就是 Cassandra 设计的工作方式。聚类顺序仅在分区内有效。这是因为每个分区键都被散列到一个令牌中，以确定它应该存储在集群中的什么位置（以提供最佳的数据分布）。然后每个分区中的行按其集群顺序写入磁盘。

所以在你的第一个例子中，每一行在每个 id 中按 time 排序。当然，由于每个分区键 (id) 不同，您无法看到这一点。但是在你的第二个例子中，你的分区键是相同的，所以你的结果是按时间聚类的。

"which looks like a tree map order, or random..."

它们按哈希标记值排序，您可以使用 token 函数查看：

aploetz@cqlsh:Whosebug2> SELECT token(id),id,time FROM dummy3;

 token(id)            | id    | time
----------------------+-------+--------------------------
 -3758069500696749310 | hello | 1969-12-31 19:06:40-0600
 -3758069500696749310 | hello | 1969-12-31 18:50:00-0600
 -3758069500696749310 | hello | 1969-12-31 18:33:20-0600
 -3758069500696749310 | hello | 1969-12-31 18:16:40-0600

(4 rows)

或者更好的例子：

aploetz@cqlsh:Whosebug2> SELECT token(id),id,time FROM dummy2;

 token(id)            | id                                   | time
----------------------+--------------------------------------+--------------------------
 -5795426230130619993 | e1fa7a80-1e64-11e5-8bf5-55cdf06f740f | 1969-12-31 18:33:20-0600
 -2088884548269216731 | e3bbb280-1e64-11e5-8bf5-55cdf06f740f | 1969-12-31 18:50:00-0600
  8496311684589314797 | e5ceb400-1e64-11e5-8bf5-55cdf06f740f | 1969-12-31 19:06:40-0600
  8930307282139899213 | e0719090-1e64-11e5-8bf5-55cdf06f740f | 1969-12-31 18:16:40-0600

(4 rows)

今年早些时候，我为 PlanetCassandra 写了一篇关于这个经常被误解的主题的文章：We Shall Have Order!读一读，看看是否能帮助您找到正确的方向。

如果主键包含（timeuuid 和时间戳），则使用时间戳的 Cassandra 集群无法按预期工作

Cassandra clustering with timestamp doesn't work as expected if the primary key contains (timeuuid and timestamp)

cassandra

cqlsh