Clickhouse 生成两个日期和间隔之间的日期数组
Clickhouse generating array of dates between two dates and interval
我最近一直在使用 Clickhouse 问了很多问题,希望有人能把我从这种痛苦中解救出来。
我的数据库中有多个日期,
例如。我的数据库有 5 月的每天日期 (2020-05-01 ~ 2020-05-31)
那我想设置5月1日和5月31日为开始日期和结束日期
并获取 SELECT 数组中具有一定间隔的日期的结果
例如
SELECT (some query about setting 2 timestamps as start / end and interval of 5 days)
那么预期结果将是
2020-05-01
2020-05-05
2020-05-10
2020-05-15 . . .goes on till 30
我希望这个时间间隔是月、日、时、分、秒、毫秒
但据我对 clickhouse 的研究,我们真的不能在 clickhouse 上使用毫秒吗????
如果是这样,我是否必须将日期转换为 UInt64,然后执行一些间隔技巧以获得结果 UInt64,然后将其转换为日期时间???
请帮助我:(
输入数据示例:
start_at
2020-01-14 18:04:36.000
2020-01-14 19:07:48.000
2020-01-14 20:46:48.000
2020-01-14 23:21:12.000
2020-01-15 00:02:00.000
2020-01-15 03:36:00.000
2020-01-15 04:54:24.000
2020-01-15 08:04:00.000
2020-01-15 09:04:00.000
2020-01-15 10:04:00.000
2020-01-15 11:04:00.000
2020-01-15 14:04:00.000
2020-01-15 18:04:00.000
2020-01-16 11:04:00.000
2020-01-16 17:04:00.000
2020-01-16 17:04:00.000
2020-01-17 11:04:00.000
2020-01-17 18:04:00.000
2020-01-17 20:04:00.000
2020-01-18 01:04:00.000
2020-01-18 15:04:00.000
预期结果(例如间隔2天)
time count
2020-01-14 18:04:36.000
2020-01-16 18:04:36.000
2020-01-18 18:04:36.000
或间隔 1 天
time count
2020-01-14 18:04:36.000
2020-01-15 18:04:36.000
2020-01-16 18:04:36.000
2020-01-17 18:04:36.000
2020-01-18 18:04:36.000
或 12 小时
time count
2020-01-14 18:04:36.000
2020-01-15 06:04:36.000
2020-01-15 18:04:36.000
2020-01-16 06:04:36.000
2020-01-16 18:04:36.000
2020-01-17 06:04:36.000
2020-01-17 18:04:36.000
2020-01-18 06:04:36.000
2020-01-18 18:04:36.000
试试这个查询:
WITH
toDateTime64('2020-01-14 18:04:36.000', 3) AS start_from,
toUnixTimestamp64Milli(start_from) AS start_from_ts,
((12 * 60) * 60) * 1000 AS interval_msec
SELECT
fromUnixTimestamp64Milli(toInt64(ts)) dt,
count
FROM (
SELECT
start_from_ts + interval_msec * interval_number AS ts,
floor((toUnixTimestamp64Milli(start_at) - start_from_ts) / interval_msec) AS interval_number,
count() AS count
FROM
(
/* emulate the test dataset */
SELECT toDateTime64(dt, 3) AS start_at
FROM (
SELECT arrayJoin([
('2020-01-14 18:04:36.000'),
('2020-01-14 19:07:48.000'),
('2020-01-14 20:46:48.000'),
('2020-01-14 23:21:12.000'),
('2020-01-15 00:02:00.000'),
('2020-01-15 03:36:00.000'),
('2020-01-15 04:54:24.000'),
('2020-01-15 08:04:00.000'),
('2020-01-15 09:04:00.000'),
('2020-01-15 10:04:00.000'),
('2020-01-15 11:04:00.000'),
('2020-01-15 14:04:00.000'),
('2020-01-15 18:04:00.000'),
('2020-01-16 11:04:00.000'),
('2020-01-16 17:04:00.000'),
('2020-01-16 17:04:00.000'),
('2020-01-17 11:04:00.000'),
('2020-01-17 18:04:00.000'),
('2020-01-17 20:04:00.000'),
('2020-01-18 01:04:00.000'),
('2020-01-18 15:04:00.000')]) dt)
)
WHERE start_at >= start_from
GROUP BY interval_number
ORDER BY ts WITH FILL FROM toUnixTimestamp64Milli(toDateTime64('2020-01-14 18:04:36.000', 3)) TO toUnixTimestamp64Milli(toDateTime64('2020-01-18 18:04:36.000', 3)) STEP ((12 * 60) * 60) * 1000
)
/* result
┌──────────────────────dt─┬─count─┐
│ 2020-01-14 18:04:36.000 │ 7 │
│ 2020-01-15 06:04:36.000 │ 6 │
│ 2020-01-15 18:04:36.000 │ 0 │
│ 2020-01-16 06:04:36.000 │ 3 │
│ 2020-01-16 18:04:36.000 │ 0 │
│ 2020-01-17 06:04:36.000 │ 2 │
│ 2020-01-17 18:04:36.000 │ 2 │
│ 2020-01-18 06:04:36.000 │ 1 │
└─────────────────────────┴───────┘
*/
上面的查询计算 间隔 12 小时 的值。
要应用到其他区间需要修改:
- interval_msec 在 WITH 子句中
- ORDER BY WITH FILL 中的值。
我最近一直在使用 Clickhouse 问了很多问题,希望有人能把我从这种痛苦中解救出来。
我的数据库中有多个日期,
例如。我的数据库有 5 月的每天日期 (2020-05-01 ~ 2020-05-31)
那我想设置5月1日和5月31日为开始日期和结束日期
并获取 SELECT 数组中具有一定间隔的日期的结果
例如
SELECT (some query about setting 2 timestamps as start / end and interval of 5 days)
那么预期结果将是
2020-05-01
2020-05-05
2020-05-10
2020-05-15 . . .goes on till 30
我希望这个时间间隔是月、日、时、分、秒、毫秒
但据我对 clickhouse 的研究,我们真的不能在 clickhouse 上使用毫秒吗????
如果是这样,我是否必须将日期转换为 UInt64,然后执行一些间隔技巧以获得结果 UInt64,然后将其转换为日期时间???
请帮助我:(
输入数据示例:
start_at
2020-01-14 18:04:36.000
2020-01-14 19:07:48.000
2020-01-14 20:46:48.000
2020-01-14 23:21:12.000
2020-01-15 00:02:00.000
2020-01-15 03:36:00.000
2020-01-15 04:54:24.000
2020-01-15 08:04:00.000
2020-01-15 09:04:00.000
2020-01-15 10:04:00.000
2020-01-15 11:04:00.000
2020-01-15 14:04:00.000
2020-01-15 18:04:00.000
2020-01-16 11:04:00.000
2020-01-16 17:04:00.000
2020-01-16 17:04:00.000
2020-01-17 11:04:00.000
2020-01-17 18:04:00.000
2020-01-17 20:04:00.000
2020-01-18 01:04:00.000
2020-01-18 15:04:00.000
预期结果(例如间隔2天)
time count
2020-01-14 18:04:36.000
2020-01-16 18:04:36.000
2020-01-18 18:04:36.000
或间隔 1 天
time count
2020-01-14 18:04:36.000
2020-01-15 18:04:36.000
2020-01-16 18:04:36.000
2020-01-17 18:04:36.000
2020-01-18 18:04:36.000
或 12 小时
time count
2020-01-14 18:04:36.000
2020-01-15 06:04:36.000
2020-01-15 18:04:36.000
2020-01-16 06:04:36.000
2020-01-16 18:04:36.000
2020-01-17 06:04:36.000
2020-01-17 18:04:36.000
2020-01-18 06:04:36.000
2020-01-18 18:04:36.000
试试这个查询:
WITH
toDateTime64('2020-01-14 18:04:36.000', 3) AS start_from,
toUnixTimestamp64Milli(start_from) AS start_from_ts,
((12 * 60) * 60) * 1000 AS interval_msec
SELECT
fromUnixTimestamp64Milli(toInt64(ts)) dt,
count
FROM (
SELECT
start_from_ts + interval_msec * interval_number AS ts,
floor((toUnixTimestamp64Milli(start_at) - start_from_ts) / interval_msec) AS interval_number,
count() AS count
FROM
(
/* emulate the test dataset */
SELECT toDateTime64(dt, 3) AS start_at
FROM (
SELECT arrayJoin([
('2020-01-14 18:04:36.000'),
('2020-01-14 19:07:48.000'),
('2020-01-14 20:46:48.000'),
('2020-01-14 23:21:12.000'),
('2020-01-15 00:02:00.000'),
('2020-01-15 03:36:00.000'),
('2020-01-15 04:54:24.000'),
('2020-01-15 08:04:00.000'),
('2020-01-15 09:04:00.000'),
('2020-01-15 10:04:00.000'),
('2020-01-15 11:04:00.000'),
('2020-01-15 14:04:00.000'),
('2020-01-15 18:04:00.000'),
('2020-01-16 11:04:00.000'),
('2020-01-16 17:04:00.000'),
('2020-01-16 17:04:00.000'),
('2020-01-17 11:04:00.000'),
('2020-01-17 18:04:00.000'),
('2020-01-17 20:04:00.000'),
('2020-01-18 01:04:00.000'),
('2020-01-18 15:04:00.000')]) dt)
)
WHERE start_at >= start_from
GROUP BY interval_number
ORDER BY ts WITH FILL FROM toUnixTimestamp64Milli(toDateTime64('2020-01-14 18:04:36.000', 3)) TO toUnixTimestamp64Milli(toDateTime64('2020-01-18 18:04:36.000', 3)) STEP ((12 * 60) * 60) * 1000
)
/* result
┌──────────────────────dt─┬─count─┐
│ 2020-01-14 18:04:36.000 │ 7 │
│ 2020-01-15 06:04:36.000 │ 6 │
│ 2020-01-15 18:04:36.000 │ 0 │
│ 2020-01-16 06:04:36.000 │ 3 │
│ 2020-01-16 18:04:36.000 │ 0 │
│ 2020-01-17 06:04:36.000 │ 2 │
│ 2020-01-17 18:04:36.000 │ 2 │
│ 2020-01-18 06:04:36.000 │ 1 │
└─────────────────────────┴───────┘
*/
上面的查询计算 间隔 12 小时 的值。
要应用到其他区间需要修改:
- interval_msec 在 WITH 子句中
- ORDER BY WITH FILL 中的值。