impala sql 将 1 行转置/转换为列或按范围分组的替代方法
impala sql transposing/ pivot 1 row into columns or alternative method for grouping by range
SELECT
SUM(CASE WHEN age >= 80 THEN 1 ELSE 0 END) AS '>=80',
SUM(CASE WHEN age BETWEEN 70 AND 79 THEN 1 ELSE 0 END) AS '70-79',
SUM(CASE WHEN age BETWEEN 60 AND 69 THEN 1 ELSE 0 END) AS '60-69',
SUM(CASE WHEN age BETWEEN 50 AND 59 THEN 1 ELSE 0 END) AS '50-59',
SUM(CASE WHEN age BETWEEN 40 AND 49 THEN 1 ELSE 0 END) AS '40-49',
SUM(CASE WHEN age BETWEEN 30 AND 39 THEN 1 ELSE 0 END) AS '30-39',
SUM(CASE WHEN age BETWEEN 20 AND 30 THEN 1 ELSE 0 END) AS '20-29',
SUM(CASE WHEN age BETWEEN 10 AND 19 THEN 1 ELSE 0 END) AS '10-19',
SUM(CASE WHEN age BETWEEN 0 AND 9 THEN 1 ELSE 0 END) AS '0-9'
FROM (SELECT * FROM table) a
我使用上面的查询来存储年龄范围,它输出:
+------+-------+-------+-------+-------+-------+--------+---------+---------+
| >=80 | 70-79 | 60-69 | 50-59 | 40-49 | 30-39 | 20-29 | 10-19 | 0-9 |
+------+-------+-------+-------+-------+-------+--------+---------+---------+
| 136 | 394 | 1273 | 2530 | 3298 | 15384 | 194099 | 2244405 | 9780789 |
+------+-------+-------+-------+-------+-------+--------+---------+---------+
我需要将其转换为柱状格式,或找到另一种分桶方法,允许上述查询生成列而不是一行值。
期望的输出:
+-----------+----------*
| age_range | freq |
+-----------+----------*
| >=80 | 136 |
+-----------+----------*
| 70-79 | 394 |
+-----------+----------*
| 60-69 | 1273 |
+-----------+----------*
| 50-59 | 2530 |
+-----------+----------*
| 40-49 | 3298 |
+-----------+----------*
| 30-39 | 15384 |
+-----------+----------*
| 20-29 | 194099 |
+-----------+----------*
| 10-19 | 2244405 |
+-----------+----------*
| 0-9 | 9780789 |
+-----------+----------*
据我所知,impala 不支持 pivot?
感谢任何帮助,谢谢
对 group by
键使用 case
表达式:
SELECT (CASE WHEN age >= 80 THEN '>=80',
WHEN age BETWEEN 70 AND 79 THEN '70-79',
WHEN age BETWEEN 60 AND 69 THEN '60-69',
WHEN age BETWEEN 50 AND 59 THEN '50-59',
WHEN age BETWEEN 40 AND 49 THEN '40-49',
WHEN age BETWEEN 30 AND 39 THEN '30-39',
WHEN age BETWEEN 20 AND 30 THEN '20-29',
WHEN age BETWEEN 10 AND 19 THEN '10-19',
WHEN age BETWEEN 0 AND 9 THEN '0-9'
END) as age_group,
COUNT(*)
FROM a
GROUP BY age_group;
编辑:
更简单的写法是:
SELECT (CASE WHEN age >= 80 THEN '>=80',
WHEN age >= 70 THEN '70-79',
WHEN age >= 60 THEN '60-69',
WHEN age >= 50 THEN '50-59',
WHEN age >= 40 THEN '40-49',
WHEN age >= 30 THEN '30-39',
WHEN age >= 20 THEN '20-29',
WHEN age >= 10 THEN '10-19',
WHEN age >= 0 THEN '0-9'
END) as age_group,
COUNT(*)
FROM a
GROUP BY age_group;
CASE
逻辑在第一个匹配值处停止。
SELECT
SUM(CASE WHEN age >= 80 THEN 1 ELSE 0 END) AS '>=80',
SUM(CASE WHEN age BETWEEN 70 AND 79 THEN 1 ELSE 0 END) AS '70-79',
SUM(CASE WHEN age BETWEEN 60 AND 69 THEN 1 ELSE 0 END) AS '60-69',
SUM(CASE WHEN age BETWEEN 50 AND 59 THEN 1 ELSE 0 END) AS '50-59',
SUM(CASE WHEN age BETWEEN 40 AND 49 THEN 1 ELSE 0 END) AS '40-49',
SUM(CASE WHEN age BETWEEN 30 AND 39 THEN 1 ELSE 0 END) AS '30-39',
SUM(CASE WHEN age BETWEEN 20 AND 30 THEN 1 ELSE 0 END) AS '20-29',
SUM(CASE WHEN age BETWEEN 10 AND 19 THEN 1 ELSE 0 END) AS '10-19',
SUM(CASE WHEN age BETWEEN 0 AND 9 THEN 1 ELSE 0 END) AS '0-9'
FROM (SELECT * FROM table) a
我使用上面的查询来存储年龄范围,它输出:
+------+-------+-------+-------+-------+-------+--------+---------+---------+
| >=80 | 70-79 | 60-69 | 50-59 | 40-49 | 30-39 | 20-29 | 10-19 | 0-9 |
+------+-------+-------+-------+-------+-------+--------+---------+---------+
| 136 | 394 | 1273 | 2530 | 3298 | 15384 | 194099 | 2244405 | 9780789 |
+------+-------+-------+-------+-------+-------+--------+---------+---------+
我需要将其转换为柱状格式,或找到另一种分桶方法,允许上述查询生成列而不是一行值。
期望的输出:
+-----------+----------*
| age_range | freq |
+-----------+----------*
| >=80 | 136 |
+-----------+----------*
| 70-79 | 394 |
+-----------+----------*
| 60-69 | 1273 |
+-----------+----------*
| 50-59 | 2530 |
+-----------+----------*
| 40-49 | 3298 |
+-----------+----------*
| 30-39 | 15384 |
+-----------+----------*
| 20-29 | 194099 |
+-----------+----------*
| 10-19 | 2244405 |
+-----------+----------*
| 0-9 | 9780789 |
+-----------+----------*
据我所知,impala 不支持 pivot?
感谢任何帮助,谢谢
对 group by
键使用 case
表达式:
SELECT (CASE WHEN age >= 80 THEN '>=80',
WHEN age BETWEEN 70 AND 79 THEN '70-79',
WHEN age BETWEEN 60 AND 69 THEN '60-69',
WHEN age BETWEEN 50 AND 59 THEN '50-59',
WHEN age BETWEEN 40 AND 49 THEN '40-49',
WHEN age BETWEEN 30 AND 39 THEN '30-39',
WHEN age BETWEEN 20 AND 30 THEN '20-29',
WHEN age BETWEEN 10 AND 19 THEN '10-19',
WHEN age BETWEEN 0 AND 9 THEN '0-9'
END) as age_group,
COUNT(*)
FROM a
GROUP BY age_group;
编辑:
更简单的写法是:
SELECT (CASE WHEN age >= 80 THEN '>=80',
WHEN age >= 70 THEN '70-79',
WHEN age >= 60 THEN '60-69',
WHEN age >= 50 THEN '50-59',
WHEN age >= 40 THEN '40-49',
WHEN age >= 30 THEN '30-39',
WHEN age >= 20 THEN '20-29',
WHEN age >= 10 THEN '10-19',
WHEN age >= 0 THEN '0-9'
END) as age_group,
COUNT(*)
FROM a
GROUP BY age_group;
CASE
逻辑在第一个匹配值处停止。