SPARK SQL select 来自分组 select 结果
SPARK SQL select from group by select result
我有一个 table,叫做 table_new。
在第一步中,我想按 id、kmstand、vacationname 和 vacationvalue 对结果进行分组,其中每个分组依据只存在一个计数。对于这一步,我已经创建了一个查询:
SELECT id, kmstand, vacationame, vacationvalue
FROM `db_1`.`table_new`
WHERE (vacationame='vacation1'
OR vacationame='vacation2'
OR vacationame='vacation3'
OR vacationame='vacation4')
GROUP BY id, kmstand, vacationame, vacationvalue
HAVING COUNT(*) = 1 ORDER BY id, kmstand DESC
结果是:
id kmstand vacationame vacationvalue
1 1 4000 vacation1 munich
2 1 4000 vacation1 stuttgart
3 1 5500 vacation4 koln
4 1 5500 vacation2 frankfurt
5 1 5500 vacation3 berlin
6 1 5500 vacation1 potsdam
7 2 6000 vacation2 new york
8 2 6000 vacation1 bangladesh
9 2 3000 vacation1 washington
10 2 3000 vacation3 chicago
现在,我想要 select 组合 kmstand 和 vacationname 现在不同的 ID。这意味着结果应该是:
id kmstand vacationame vacationvalue
1 1 5500 vacation4 koln
2 1 5500 vacation2 frankfurt
3 1 5500 vacation3 berlin
4 1 5500 vacation1 potsdam
5 2 6000 vacation2 new york
6 2 6000 vacation1 bangladesh
7 2 3000 vacation1 washington
8 2 3000 vacation3 chicago
为此,我创建了以下嵌套 sql 查询:
SELECT id, kmstand, count(*) as cnt
FROM `db_1`.`table_new`
WHERE (SELECT id, kmstand, vacationame, vacationvalue
FROM `db_1`.`table_new`
WHERE (vacationame='vacation1'
OR vacationame='vacation2'
OR vacationame='vacation3'
OR vacationame='vacation4')
GROUP BY id, kmstand, vacationame, vacationvalue
HAVING COUNT(*) = 1 ORDER BY id, kmstand DESC)
GROUP BY id, kmstand
HAVING cnt = 1
ORDER BY id, kmstand DESC
我也尝试过不带 where 子句或不带 from 的方法,但没有找到解决方案。对于此 SQL 查询,我收到以下错误消息:org.apache.spark.sql.AnalysisException: cannot recognize input near 'SELECT' 'id' ',' in expression specification; line 3 pos 7
我很确定,语法不正确。你有什么建议吗?
不熟悉 SPARK,但您可能想要:
SELECT id, kmstand, count(*) as cnt
FROM (SELECT id, kmstand, vacationame, vacationvalue
FROM `db_1`.`table_new`
WHERE (vacationame='vacation1'
OR vacationame='vacation2'
OR vacationame='vacation3'
OR vacationame='vacation4')
GROUP BY id, kmstand, vacationame, vacationvalue
HAVING COUNT(*) = 1) T
GROUP BY id, kmstand
HAVING cnt = 1
ORDER BY id, kmstand DESC
请注意,我在 FROM 子句中为 table 添加了一个别名 (T
)。这可能需要也可能不需要,具体取决于您的 RDBMS。
另请注意,您通常不能在子查询中使用 ORDER BY。
这是问题的答案。现在在这里我能够获得组合 id、kmstand 和 vacationame 不同的 id。
SELECT id, sumcnt, cnt2
FROM(
SELECT id, count(*) as cnt2, sum(cnt) as sumcnt
FROM (
SELECT id, kmstand, vacationame, count(*) as cnt
FROM `db_1`.`table_new`
WHERE (vacationame='vacation1'
OR vacationame='vacation2'
OR vacationame='vacation3'
OR vacationame='vacation4')
GROUP BY id, kmstand, vacationame)T
GROUP BY id)T
WHERE (sumcnt/cnt2 = 1)
我有一个 table,叫做 table_new。 在第一步中,我想按 id、kmstand、vacationname 和 vacationvalue 对结果进行分组,其中每个分组依据只存在一个计数。对于这一步,我已经创建了一个查询:
SELECT id, kmstand, vacationame, vacationvalue
FROM `db_1`.`table_new`
WHERE (vacationame='vacation1'
OR vacationame='vacation2'
OR vacationame='vacation3'
OR vacationame='vacation4')
GROUP BY id, kmstand, vacationame, vacationvalue
HAVING COUNT(*) = 1 ORDER BY id, kmstand DESC
结果是:
id kmstand vacationame vacationvalue
1 1 4000 vacation1 munich
2 1 4000 vacation1 stuttgart
3 1 5500 vacation4 koln
4 1 5500 vacation2 frankfurt
5 1 5500 vacation3 berlin
6 1 5500 vacation1 potsdam
7 2 6000 vacation2 new york
8 2 6000 vacation1 bangladesh
9 2 3000 vacation1 washington
10 2 3000 vacation3 chicago
现在,我想要 select 组合 kmstand 和 vacationname 现在不同的 ID。这意味着结果应该是:
id kmstand vacationame vacationvalue
1 1 5500 vacation4 koln
2 1 5500 vacation2 frankfurt
3 1 5500 vacation3 berlin
4 1 5500 vacation1 potsdam
5 2 6000 vacation2 new york
6 2 6000 vacation1 bangladesh
7 2 3000 vacation1 washington
8 2 3000 vacation3 chicago
为此,我创建了以下嵌套 sql 查询:
SELECT id, kmstand, count(*) as cnt
FROM `db_1`.`table_new`
WHERE (SELECT id, kmstand, vacationame, vacationvalue
FROM `db_1`.`table_new`
WHERE (vacationame='vacation1'
OR vacationame='vacation2'
OR vacationame='vacation3'
OR vacationame='vacation4')
GROUP BY id, kmstand, vacationame, vacationvalue
HAVING COUNT(*) = 1 ORDER BY id, kmstand DESC)
GROUP BY id, kmstand
HAVING cnt = 1
ORDER BY id, kmstand DESC
我也尝试过不带 where 子句或不带 from 的方法,但没有找到解决方案。对于此 SQL 查询,我收到以下错误消息:org.apache.spark.sql.AnalysisException: cannot recognize input near 'SELECT' 'id' ',' in expression specification; line 3 pos 7
我很确定,语法不正确。你有什么建议吗?
不熟悉 SPARK,但您可能想要:
SELECT id, kmstand, count(*) as cnt
FROM (SELECT id, kmstand, vacationame, vacationvalue
FROM `db_1`.`table_new`
WHERE (vacationame='vacation1'
OR vacationame='vacation2'
OR vacationame='vacation3'
OR vacationame='vacation4')
GROUP BY id, kmstand, vacationame, vacationvalue
HAVING COUNT(*) = 1) T
GROUP BY id, kmstand
HAVING cnt = 1
ORDER BY id, kmstand DESC
请注意,我在 FROM 子句中为 table 添加了一个别名 (T
)。这可能需要也可能不需要,具体取决于您的 RDBMS。
另请注意,您通常不能在子查询中使用 ORDER BY。
这是问题的答案。现在在这里我能够获得组合 id、kmstand 和 vacationame 不同的 id。
SELECT id, sumcnt, cnt2
FROM(
SELECT id, count(*) as cnt2, sum(cnt) as sumcnt
FROM (
SELECT id, kmstand, vacationame, count(*) as cnt
FROM `db_1`.`table_new`
WHERE (vacationame='vacation1'
OR vacationame='vacation2'
OR vacationame='vacation3'
OR vacationame='vacation4')
GROUP BY id, kmstand, vacationame)T
GROUP BY id)T
WHERE (sumcnt/cnt2 = 1)