日期之间的前 10 个产品
Top 10 products for dates between
我正在尝试找出更好的方法来创建单个查询,该查询将生成每个日期前 10 种产品的结果。我有一行包含两列 - PID (int)
和 EventDate (date)
,每次点击一行。
你能建议我如何获得在某个日期范围内点击前 10 名产品的结果吗?我一直在理解必须如何构造子查询。我只能得到按日期分组的部分,但后来我的思绪停留在 count() 和聚合问题上。
这是我对单个日期的查询,但我想必须查询一个日期范围。当然,我可以通过生成事件日期来进行子查询,但我想弄清楚如何更优雅地进行。
SELECT TOP 10 COUNT() as count, PID
FROM view_product
WHERE EventDate = toDate('2020-05-11')
GROUP BY PID
ORDER BY count DESC
预期的输出是这样的:
PID Count Date
1 123 2020-02-04
21 101 2020-02-04
1332 99 2020-02-04
11 51 2020-02-04
634 49 2020-02-04
1332 43 2020-02-04
1 24 2020-02-04
21 23 2020-02-04
1332 6 2020-02-04
11 3 2020-02-04
1 266 2020-02-02
21 241 2020-02-02
1332 232 2020-02-02
11 179 2020-02-02
634 163 2020-02-02
1332 159 2020-02-02
1 144 2020-02-02
21 100 2020-02-02
1332 99 2020-02-02
11 74 2020-02-02
它需要使用 LIMIT BY-子句,每天占据前 10 行:
SELECT
PID,
EventDate,
count() AS Count
FROM view_product
WHERE EventDate >= '2020-05-01' AND EventDate < '2020-06-01'
GROUP BY EventDate, PID
ORDER BY EventDate, Count DESC
LIMIT 10 BY EventDate;
测试示例:
SELECT
PID,
EventDate,
count() AS Count
FROM (
/* emulate test set */
SELECT test_data.1 AS PID, toDate(test_data.2) AS EventDate
FROM (
SELECT arrayJoin([
(1, '2020-02-04'),
(21, '2020-02-04'),
(1332, '2020-02-04'),
(11, '2020-02-04'),
(634, '2020-02-04'),
(1, '2020-02-04'),
(1, '2020-02-04'),
(21, '2020-02-04'),
(1, '2020-02-04'),
(1, '2020-02-02'),
(21, '2020-02-02'),
(11, '2020-02-02'),
(1332, '2020-02-02'),
(1332, '2020-02-02'),
(1332, '2020-02-02'),
(11, '2020-02-02')]) test_data))
GROUP BY EventDate, PID
ORDER BY EventDate, Count DESC
LIMIT 2 BY EventDate;
/* result
┌──PID─┬──EventDate─┬─Count─┐
│ 1332 │ 2020-02-02 │ 3 │
│ 11 │ 2020-02-02 │ 2 │
│ 1 │ 2020-02-04 │ 4 │
│ 21 │ 2020-02-04 │ 2 │
└──────┴────────────┴───────┘
*/
要仅获取没有计数值的前 n 个项目,请使用 topK-聚合函数:
SELECT
EventDate,
topK(10)(PID)
FROM (
/* emulate test set */
SELECT test_data.1 AS PID, toDate(test_data.2) AS EventDate
FROM (
SELECT arrayJoin([
(1, '2020-02-04'),
(21, '2020-02-04'),
(1332, '2020-02-04'),
(11, '2020-02-04'),
(634, '2020-02-04'),
(1, '2020-02-04'),
(1, '2020-02-04'),
(21, '2020-02-04'),
(1, '2020-02-04'),
(1, '2020-02-02'),
(21, '2020-02-02'),
(11, '2020-02-02'),
(1332, '2020-02-02'),
(1332, '2020-02-02'),
(1332, '2020-02-02'),
(11, '2020-02-02')]) test_data))
GROUP BY EventDate;
/* result
┌──EventDate─┬─topK(10)(PID)──────┐
│ 2020-02-02 │ [1332,11,1,21] │
│ 2020-02-04 │ [1,21,1332,11,634] │
└────────────┴────────────────────┘
*/
我正在尝试找出更好的方法来创建单个查询,该查询将生成每个日期前 10 种产品的结果。我有一行包含两列 - PID (int)
和 EventDate (date)
,每次点击一行。
你能建议我如何获得在某个日期范围内点击前 10 名产品的结果吗?我一直在理解必须如何构造子查询。我只能得到按日期分组的部分,但后来我的思绪停留在 count() 和聚合问题上。
这是我对单个日期的查询,但我想必须查询一个日期范围。当然,我可以通过生成事件日期来进行子查询,但我想弄清楚如何更优雅地进行。
SELECT TOP 10 COUNT() as count, PID
FROM view_product
WHERE EventDate = toDate('2020-05-11')
GROUP BY PID
ORDER BY count DESC
预期的输出是这样的:
PID Count Date
1 123 2020-02-04
21 101 2020-02-04
1332 99 2020-02-04
11 51 2020-02-04
634 49 2020-02-04
1332 43 2020-02-04
1 24 2020-02-04
21 23 2020-02-04
1332 6 2020-02-04
11 3 2020-02-04
1 266 2020-02-02
21 241 2020-02-02
1332 232 2020-02-02
11 179 2020-02-02
634 163 2020-02-02
1332 159 2020-02-02
1 144 2020-02-02
21 100 2020-02-02
1332 99 2020-02-02
11 74 2020-02-02
它需要使用 LIMIT BY-子句,每天占据前 10 行:
SELECT
PID,
EventDate,
count() AS Count
FROM view_product
WHERE EventDate >= '2020-05-01' AND EventDate < '2020-06-01'
GROUP BY EventDate, PID
ORDER BY EventDate, Count DESC
LIMIT 10 BY EventDate;
测试示例:
SELECT
PID,
EventDate,
count() AS Count
FROM (
/* emulate test set */
SELECT test_data.1 AS PID, toDate(test_data.2) AS EventDate
FROM (
SELECT arrayJoin([
(1, '2020-02-04'),
(21, '2020-02-04'),
(1332, '2020-02-04'),
(11, '2020-02-04'),
(634, '2020-02-04'),
(1, '2020-02-04'),
(1, '2020-02-04'),
(21, '2020-02-04'),
(1, '2020-02-04'),
(1, '2020-02-02'),
(21, '2020-02-02'),
(11, '2020-02-02'),
(1332, '2020-02-02'),
(1332, '2020-02-02'),
(1332, '2020-02-02'),
(11, '2020-02-02')]) test_data))
GROUP BY EventDate, PID
ORDER BY EventDate, Count DESC
LIMIT 2 BY EventDate;
/* result
┌──PID─┬──EventDate─┬─Count─┐
│ 1332 │ 2020-02-02 │ 3 │
│ 11 │ 2020-02-02 │ 2 │
│ 1 │ 2020-02-04 │ 4 │
│ 21 │ 2020-02-04 │ 2 │
└──────┴────────────┴───────┘
*/
要仅获取没有计数值的前 n 个项目,请使用 topK-聚合函数:
SELECT
EventDate,
topK(10)(PID)
FROM (
/* emulate test set */
SELECT test_data.1 AS PID, toDate(test_data.2) AS EventDate
FROM (
SELECT arrayJoin([
(1, '2020-02-04'),
(21, '2020-02-04'),
(1332, '2020-02-04'),
(11, '2020-02-04'),
(634, '2020-02-04'),
(1, '2020-02-04'),
(1, '2020-02-04'),
(21, '2020-02-04'),
(1, '2020-02-04'),
(1, '2020-02-02'),
(21, '2020-02-02'),
(11, '2020-02-02'),
(1332, '2020-02-02'),
(1332, '2020-02-02'),
(1332, '2020-02-02'),
(11, '2020-02-02')]) test_data))
GROUP BY EventDate;
/* result
┌──EventDate─┬─topK(10)(PID)──────┐
│ 2020-02-02 │ [1332,11,1,21] │
│ 2020-02-04 │ [1,21,1332,11,634] │
└────────────┴────────────────────┘
*/