Clickhouse 生成两个日期和间隔之间的日期数组

Clickhouse generating array of dates between two dates and interval

我最近一直在使用 Clickhouse 问了很多问题,希望有人能把我从这种痛苦中解救出来。

我的数据库中有多个日期,

例如。我的数据库有 5 月的每天日期 (2020-05-01 ~ 2020-05-31)

那我想设置5月1日和5月31日为开始日期和结束日期

并获取 SELECT 数组中具有一定间隔的日期的结果

例如

SELECT (some query about setting 2 timestamps as start / end and interval of 5 days)

那么预期结果将是

2020-05-01
2020-05-05
2020-05-10
2020-05-15 . . .goes on till 30

我希望这个时间间隔是月、日、时、分、秒、毫秒

但据我对 clickhouse 的研究,我们真的不能在 clickhouse 上使用毫秒吗????

如果是这样,我是否必须将日期转换为 UInt64,然后执行一些间隔技巧以获得结果 UInt64,然后将其转换为日期时间???

请帮助我:(

输入数据示例:

start_at
2020-01-14 18:04:36.000
2020-01-14 19:07:48.000
2020-01-14 20:46:48.000
2020-01-14 23:21:12.000
2020-01-15 00:02:00.000
2020-01-15 03:36:00.000
2020-01-15 04:54:24.000
2020-01-15 08:04:00.000
2020-01-15 09:04:00.000
2020-01-15 10:04:00.000
2020-01-15 11:04:00.000
2020-01-15 14:04:00.000
2020-01-15 18:04:00.000
2020-01-16 11:04:00.000
2020-01-16 17:04:00.000
2020-01-16 17:04:00.000
2020-01-17 11:04:00.000
2020-01-17 18:04:00.000
2020-01-17 20:04:00.000
2020-01-18 01:04:00.000
2020-01-18 15:04:00.000

预期结果(例如间隔2天)

    time                        count
2020-01-14 18:04:36.000
2020-01-16 18:04:36.000
2020-01-18 18:04:36.000 

或间隔 1 天

         time                       count
2020-01-14 18:04:36.000
2020-01-15 18:04:36.000
2020-01-16 18:04:36.000
2020-01-17 18:04:36.000
2020-01-18 18:04:36.000

或 12 小时

             time                       count
2020-01-14 18:04:36.000
2020-01-15 06:04:36.000
2020-01-15 18:04:36.000
2020-01-16 06:04:36.000
2020-01-16 18:04:36.000
2020-01-17 06:04:36.000
2020-01-17 18:04:36.000
2020-01-18 06:04:36.000
2020-01-18 18:04:36.000

试试这个查询:

WITH
    toDateTime64('2020-01-14 18:04:36.000', 3) AS start_from,
    toUnixTimestamp64Milli(start_from) AS start_from_ts,
    ((12 * 60) * 60) * 1000 AS interval_msec
SELECT
  fromUnixTimestamp64Milli(toInt64(ts)) dt,
  count
FROM (    
  SELECT
      start_from_ts + interval_msec * interval_number AS ts,
      floor((toUnixTimestamp64Milli(start_at) - start_from_ts) / interval_msec) AS interval_number,
      count() AS count
  FROM 
  (
    /* emulate the test dataset */
    SELECT toDateTime64(dt, 3) AS start_at
      FROM (
        SELECT arrayJoin([
          ('2020-01-14 18:04:36.000'),
          ('2020-01-14 19:07:48.000'),
          ('2020-01-14 20:46:48.000'),
          ('2020-01-14 23:21:12.000'),
          ('2020-01-15 00:02:00.000'),
          ('2020-01-15 03:36:00.000'),
          ('2020-01-15 04:54:24.000'),
          ('2020-01-15 08:04:00.000'),
          ('2020-01-15 09:04:00.000'),
          ('2020-01-15 10:04:00.000'),
          ('2020-01-15 11:04:00.000'),
          ('2020-01-15 14:04:00.000'),
          ('2020-01-15 18:04:00.000'),
          ('2020-01-16 11:04:00.000'),
          ('2020-01-16 17:04:00.000'),
          ('2020-01-16 17:04:00.000'),
          ('2020-01-17 11:04:00.000'),
          ('2020-01-17 18:04:00.000'),
          ('2020-01-17 20:04:00.000'),
          ('2020-01-18 01:04:00.000'),
          ('2020-01-18 15:04:00.000')]) dt)
    )
  WHERE start_at >= start_from
  GROUP BY interval_number
  ORDER BY ts WITH FILL FROM toUnixTimestamp64Milli(toDateTime64('2020-01-14 18:04:36.000', 3)) TO toUnixTimestamp64Milli(toDateTime64('2020-01-18 18:04:36.000', 3)) STEP ((12 * 60) * 60) * 1000
  )

/* result
┌──────────────────────dt─┬─count─┐
│ 2020-01-14 18:04:36.000 │     7 │
│ 2020-01-15 06:04:36.000 │     6 │
│ 2020-01-15 18:04:36.000 │     0 │
│ 2020-01-16 06:04:36.000 │     3 │
│ 2020-01-16 18:04:36.000 │     0 │
│ 2020-01-17 06:04:36.000 │     2 │
│ 2020-01-17 18:04:36.000 │     2 │
│ 2020-01-18 06:04:36.000 │     1 │
└─────────────────────────┴───────┘
*/

上面的查询计算 间隔 12 小时 的值。

要应用到其他区间需要修改: