如何获取当前月份的 ID 数量,这些 ID 也出现在 Snowflake 的前三个月中 - SQL

How to get number of IDs in the current month that also appears in the previous three months in Snowflake - SQL

我在雪花中有一个 table,时间范围从 2019.01 到 2020.01。一个 ID 可以在任何日期出现多次(匹配)。

例如: my_table: 两列 dddateid

dddate id
2019-02-03 607
2019-01-07 356
2019-08-06 491
2019-01-01 607
2019-12-17 529
2019-04-15 356

......

有什么方法可以找到当月至少出现一次且前三个月至少出现一次的 ID 总数,并按月分组以显示每个月的数量从 2019-04 开始(table 中提供前三个月数据的第一个月)到 2020-01。

我正在考虑这样的代码:

WITH PREV_THREE AS (
SELECT 
  DATE_TRUNC('MONTH', dddate) AS MONTH, 
  ID AS CURR_ID
FROM my_table mt 
INNER JOIN
(
(
SELECT 
  MONTH(DATEADD(DATE_TRUNC('MONTH', dddate), -1, GETDATE())) AS PREV_MONTH, 
  ID AS PREV_3_MON_ID
FROM my_table
)
UNION ALL
(
SELECT 
  MONTH(DATEADD(DATE_TRUNC('MONTH', dddate), -2, GETDATE())) AS PREV_MONTH, 
  ID AS PREV_3_MON_ID
FROM my_table
)
UNION ALL
(
SELECT 
  MONTH(DATEADD(DATE_TRUNC('MONTH', dddate), -3, GETDATE())) AS PREV_MONTH, 
  ID AS PREV_3_MON_ID
FROM my_table 
)
) AS PREV_3_MON
ON mt.CURR_ID = PREV_3_MON.PREV_3_MON_ID
)
SELECT MONTH, COUNT(DISTINCT ID) AS COUNTER
FROM PREV_THREE
GROUP BY 1
ORDER BY 1

但是,它以某种方式 returns 出错并且似乎无法正常工作。谁能帮我解决这个问题?提前致谢!

您可以使用 lag():

select distinct id
from (select t.*,
             lag(dddate) over (partition by id order by dddate) as prev_dddate
      from my_table t
     ) t
where dddate >= date_trunc('MONTH', current_date) and
      prev_dddate < date_trunc('MONTH', current_date) and
      prev_dddate >= date_trunc('MONTH', current_date) - interval '3 month';

您可以这样做多个月:

select date_trunc('MONTH', dddate), count(distinct id)
from (select t.*,
             lag(dddate) over (partition by id order by dddate) as prev_dddate
      from my_table t
     ) t
where prev_dddate < date_trunc('MONTH', date_trunc('MONTH', dddate)) and
      prev_dddate >= date_trunc('MONTH', date_trunc('MONTH', dddate)) - interval '3 month'
group by date_trunc('MONTH', dddate);

即使 id 在一个月内出现多次,其中 一个 将排在第一位,而 lag() 将标识最近的前一个月.