Presto 中的组内等效模式
Mode-within-group equivalent in Presto
在 Postgres 中,以下查询输出每个客户最常购买的奶酪:
SELECT
customer,
MODE() WITHIN GROUP (ORDER BY "subcategory") AS "fav_cheese"
FROM dft
WHERE category = 'CHEESE'
GROUP BY
customer
这个returns:
customer fav_cheese
1 cheddar # customer1's most-frequently-purchased cheese is cheddar
2 blue # customer2's most-frequently-purchased cheese is blue
3 shredded # customer3's most-frequently-purchased cheese is shredded
如何在Presto中实现相同的输出?
到目前为止我尝试了不同的方法都没有成功。
您可以使用 window 函数:
SELECT customer, subcategory AS fav_cheese
FROM (SELECT customer, category, subcategory, COUNT(*) as cnt,
ROW_NUMBER() OVER (PARTITION BY customer ORDER BY COUNT(*) DESC) as seqnum
FROM dft
WHERE category = 'CHEESE'
GROUP BY customer, category, subcategory
) t
WHERE seqnum = 1;
作为解决方法,您可以使用直方图方法:
SELECT customer,
MAP_KEYS(hist)[
ARRAY_POSITION(MAP_VALUES(hist), ARRAY_MAX(MAP_VALUES(hist)))
] as fav_cheese
FROM (
SELECT customer, histogram(subcategory) as hist
FROM dft
WHERE category = 'CHEESE'
GROUP BY customer
) as f
在 Postgres 中,以下查询输出每个客户最常购买的奶酪:
SELECT
customer,
MODE() WITHIN GROUP (ORDER BY "subcategory") AS "fav_cheese"
FROM dft
WHERE category = 'CHEESE'
GROUP BY
customer
这个returns:
customer fav_cheese
1 cheddar # customer1's most-frequently-purchased cheese is cheddar
2 blue # customer2's most-frequently-purchased cheese is blue
3 shredded # customer3's most-frequently-purchased cheese is shredded
如何在Presto中实现相同的输出?
到目前为止我尝试了不同的方法都没有成功。
您可以使用 window 函数:
SELECT customer, subcategory AS fav_cheese
FROM (SELECT customer, category, subcategory, COUNT(*) as cnt,
ROW_NUMBER() OVER (PARTITION BY customer ORDER BY COUNT(*) DESC) as seqnum
FROM dft
WHERE category = 'CHEESE'
GROUP BY customer, category, subcategory
) t
WHERE seqnum = 1;
作为解决方法,您可以使用直方图方法:
SELECT customer,
MAP_KEYS(hist)[
ARRAY_POSITION(MAP_VALUES(hist), ARRAY_MAX(MAP_VALUES(hist)))
] as fav_cheese
FROM (
SELECT customer, histogram(subcategory) as hist
FROM dft
WHERE category = 'CHEESE'
GROUP BY customer
) as f