在 snowflake 中使用 order by timestamp_tz(9) 的排名函数无法正常工作
rank function using order by timestamp_tz(9) in snowflake is not working properly
snowflake rank function order by same ROW_MODIFIED_TMST function is generating unique numbers.
例如:
Table1
Column1 ROW_MODIFIED_TMST
A 2022-04-03 17:42:41.009 +0000
b 2022-04-03 17:42:41.009 +0000
c 2022-04-03 17:42:41.009 +0000
d 2022-04-03 17:42:41.009 +0100
select
rank() over(partition by column1 order by ROW_MODIFIED_TMST desc) from table1
Column1 ROW_MODIFIED_TMST RANK
A 2022-04-03 17:42:41.009 +0000 1
b 2022-04-03 17:42:41.009 +0000 2
c 2022-04-03 17:42:41.009 +0000 3
d 2022-04-03 17:42:41.009 +0100 4
Here rank function should be 1,1,1,2 instead of 1,2,3,4
Please suggest
在这个例子中,第 1 列的分区是问题所在。每个值都不同 a,b,c,d。
为避免间隙,应使用 DENSE_RANK
而不是 RANK
。
代码应该是:
select *, dense_rank() over(order by ROW_MODIFIED_TMST desc)
from table1
因此,如果我们以示例数据为例,运行 它:
with table1(Column1,ROW_MODIFIED_TMST) as (
SELECT * FROM VALUES
('A', '2022-04-03 17:42:41.009 +0000'::timestamp_tz),
('b', '2022-04-03 17:42:41.009 +0000'::timestamp_tz),
('c', '2022-04-03 17:42:41.009 +0000'::timestamp_tz),
('d', '2022-04-03 17:42:41.009 +0100'::timestamp_tz)
)
select
rank() over(partition by column1 order by ROW_MODIFIED_TMST desc) from table1
RANK() OVER(PARTITION BY COLUMN1 ORDER BY ROW_MODIFIED_TMST DESC)
1
1
1
1
它完全符合我的预期,以及 Lukazs 指出的内容。
但是你说:
Here rank function should be 1,1,1,2 instead of 1,2,3,4
但是没有得到1,2,3,4,也不应该得到1,1,1,2,因为四个Column1的值都不一样。
现在,如果您删除这四个不同的 Column1 值的 PARTITION BY
我们可以看到两种类型的 RANK 如何工作,并与 ROW_NUMBER()
进行比较
select
rank() over(order by ROW_MODIFIED_TMST desc) as sparse,
dense_rank() over(order by ROW_MODIFIED_TMST desc) as dense,
row_number() over(order by ROW_MODIFIED_TMST desc) as rn
from table1
给出:
SPARSE
DENSE
RN
1
1
1
1
1
2
1
1
3
4
2
4
snowflake rank function order by same ROW_MODIFIED_TMST function is generating unique numbers.
例如:
Table1
Column1 ROW_MODIFIED_TMST
A 2022-04-03 17:42:41.009 +0000
b 2022-04-03 17:42:41.009 +0000
c 2022-04-03 17:42:41.009 +0000
d 2022-04-03 17:42:41.009 +0100
select
rank() over(partition by column1 order by ROW_MODIFIED_TMST desc) from table1
Column1 ROW_MODIFIED_TMST RANK
A 2022-04-03 17:42:41.009 +0000 1
b 2022-04-03 17:42:41.009 +0000 2
c 2022-04-03 17:42:41.009 +0000 3
d 2022-04-03 17:42:41.009 +0100 4
Here rank function should be 1,1,1,2 instead of 1,2,3,4
Please suggest
在这个例子中,第 1 列的分区是问题所在。每个值都不同 a,b,c,d。
为避免间隙,应使用 DENSE_RANK
而不是 RANK
。
代码应该是:
select *, dense_rank() over(order by ROW_MODIFIED_TMST desc)
from table1
因此,如果我们以示例数据为例,运行 它:
with table1(Column1,ROW_MODIFIED_TMST) as (
SELECT * FROM VALUES
('A', '2022-04-03 17:42:41.009 +0000'::timestamp_tz),
('b', '2022-04-03 17:42:41.009 +0000'::timestamp_tz),
('c', '2022-04-03 17:42:41.009 +0000'::timestamp_tz),
('d', '2022-04-03 17:42:41.009 +0100'::timestamp_tz)
)
select
rank() over(partition by column1 order by ROW_MODIFIED_TMST desc) from table1
RANK() OVER(PARTITION BY COLUMN1 ORDER BY ROW_MODIFIED_TMST DESC) |
---|
1 |
1 |
1 |
1 |
它完全符合我的预期,以及 Lukazs 指出的内容。
但是你说:
Here rank function should be 1,1,1,2 instead of 1,2,3,4
但是没有得到1,2,3,4,也不应该得到1,1,1,2,因为四个Column1的值都不一样。
现在,如果您删除这四个不同的 Column1 值的 PARTITION BY
我们可以看到两种类型的 RANK 如何工作,并与 ROW_NUMBER()
进行比较select
rank() over(order by ROW_MODIFIED_TMST desc) as sparse,
dense_rank() over(order by ROW_MODIFIED_TMST desc) as dense,
row_number() over(order by ROW_MODIFIED_TMST desc) as rn
from table1
给出:
SPARSE | DENSE | RN |
---|---|---|
1 | 1 | 1 |
1 | 1 | 2 |
1 | 1 | 3 |
4 | 2 | 4 |