SQL 使用分组聚合计数
SQL Count with Grouped aggregates
我正在尝试编写一个 SQL 查询,该查询允许我创建包含历史数据的折线图。我试图弄清楚随着时间的推移(每日津贴),有多少用户在使用我的应用程序的每个版本。我的 Y 轴将是所有应用程序的使用百分比(满分 100 个),X 轴是日期,每个构建都是不同的行。在任何时刻,所有行的总和都应等于 100%。
由于此查询应按 Version/Build 分组,除了日期之外,我还想弄清楚如何在我的查询中获取任何给定日期的总用户百分比。到目前为止我能够得到这个查询:
SELECT DISTINCT
sub.Version,
sub.Build,
sub.app_id,
sub.Users,
sub.`day`,
(
SELECT COUNT(DISTINCT user_id)
FROM snowplow_enricher_good seg
) AS Total,
(sub.Users/Total) * 100 AS Percent
FROM
(
SELECT
visitParamExtractString(seg.contexts, 'version') AS Version,
visitParamExtractString(seg.contexts, 'build') AS Build,
seg.app_id,
seg.`day`,
CONCAT(
Version,
' (',
Build,
')'
) AS AppBuildVersion,
COUNT(DISTINCT seg.user_id) AS Users
FROM snowplow_enricher_good seg
GROUP BY Version, Build, app_id, `day`
ORDER BY Users DESC
) AS sub
WHERE sub.app_id = 'APPID';
请注意,当前显示的百分比是所有天数的百分比,而不是某一天的百分比。我尝试在自定义 FROM
语句中创建 WHERE
子句,但失败了。
提前谢谢你:)
数组
SELECT
totalCnt,
totalSum,
ga.1 AS tag,
ga.2 AS value,
(value / totalSum) * 100 AS percent
FROM
(
SELECT
count() AS totalCnt,
sum(value) AS totalSum,
groupArray((tag, value)) AS ga
FROM
(
SELECT
tag,
value
FROM
(
SELECT
[1, 2, 3, 4, 5] AS tag,
[10, 100, 50, 100, 40] AS value
)
ARRAY JOIN
tag,
value
)
)
ARRAY JOIN ga
┌─totalCnt─┬─totalSum─┬─tag─┬─value─┬────────────percent─┐
│ 5 │ 300 │ 1 │ 10 │ 3.3333333333333335 │
│ 5 │ 300 │ 2 │ 100 │ 33.33333333333333 │
│ 5 │ 300 │ 3 │ 50 │ 16.666666666666664 │
│ 5 │ 300 │ 4 │ 100 │ 33.33333333333333 │
│ 5 │ 300 │ 5 │ 40 │ 13.333333333333334 │
└──────────┴──────────┴─────┴───────┴────────────────────┘
能够使用一系列连接和子查询来解决问题:
SELECT
day,
app_id,
version,
version_count,
app_count,
(version_count / app_count) * 100 AS percent
FROM (
SELECT
day,
app_id,
visitParamExtractString(contexts, 'version') AS version,
count(DISTINCT user_id) AS version_count
FROM
snowplow_enricher_good
where
day >= subtractDays(today(), 30)
GROUP BY
day,
app_id,
version
)
INNER JOIN (
SELECT
day,
app_id,
count(DISTINCT user_id) AS app_count
FROM
snowplow_enricher_good
WHERE
day >= subtractDays(today(), 30)
GROUP BY
day,
app_id
)
USING
day,
app_id
WHERE
app_id = 'APPID'
ORDER BY
day DESC,
app_id,
version;
我正在尝试编写一个 SQL 查询,该查询允许我创建包含历史数据的折线图。我试图弄清楚随着时间的推移(每日津贴),有多少用户在使用我的应用程序的每个版本。我的 Y 轴将是所有应用程序的使用百分比(满分 100 个),X 轴是日期,每个构建都是不同的行。在任何时刻,所有行的总和都应等于 100%。
由于此查询应按 Version/Build 分组,除了日期之外,我还想弄清楚如何在我的查询中获取任何给定日期的总用户百分比。到目前为止我能够得到这个查询:
SELECT DISTINCT
sub.Version,
sub.Build,
sub.app_id,
sub.Users,
sub.`day`,
(
SELECT COUNT(DISTINCT user_id)
FROM snowplow_enricher_good seg
) AS Total,
(sub.Users/Total) * 100 AS Percent
FROM
(
SELECT
visitParamExtractString(seg.contexts, 'version') AS Version,
visitParamExtractString(seg.contexts, 'build') AS Build,
seg.app_id,
seg.`day`,
CONCAT(
Version,
' (',
Build,
')'
) AS AppBuildVersion,
COUNT(DISTINCT seg.user_id) AS Users
FROM snowplow_enricher_good seg
GROUP BY Version, Build, app_id, `day`
ORDER BY Users DESC
) AS sub
WHERE sub.app_id = 'APPID';
请注意,当前显示的百分比是所有天数的百分比,而不是某一天的百分比。我尝试在自定义 FROM
语句中创建 WHERE
子句,但失败了。
提前谢谢你:)
数组
SELECT
totalCnt,
totalSum,
ga.1 AS tag,
ga.2 AS value,
(value / totalSum) * 100 AS percent
FROM
(
SELECT
count() AS totalCnt,
sum(value) AS totalSum,
groupArray((tag, value)) AS ga
FROM
(
SELECT
tag,
value
FROM
(
SELECT
[1, 2, 3, 4, 5] AS tag,
[10, 100, 50, 100, 40] AS value
)
ARRAY JOIN
tag,
value
)
)
ARRAY JOIN ga
┌─totalCnt─┬─totalSum─┬─tag─┬─value─┬────────────percent─┐
│ 5 │ 300 │ 1 │ 10 │ 3.3333333333333335 │
│ 5 │ 300 │ 2 │ 100 │ 33.33333333333333 │
│ 5 │ 300 │ 3 │ 50 │ 16.666666666666664 │
│ 5 │ 300 │ 4 │ 100 │ 33.33333333333333 │
│ 5 │ 300 │ 5 │ 40 │ 13.333333333333334 │
└──────────┴──────────┴─────┴───────┴────────────────────┘
能够使用一系列连接和子查询来解决问题:
SELECT
day,
app_id,
version,
version_count,
app_count,
(version_count / app_count) * 100 AS percent
FROM (
SELECT
day,
app_id,
visitParamExtractString(contexts, 'version') AS version,
count(DISTINCT user_id) AS version_count
FROM
snowplow_enricher_good
where
day >= subtractDays(today(), 30)
GROUP BY
day,
app_id,
version
)
INNER JOIN (
SELECT
day,
app_id,
count(DISTINCT user_id) AS app_count
FROM
snowplow_enricher_good
WHERE
day >= subtractDays(today(), 30)
GROUP BY
day,
app_id
)
USING
day,
app_id
WHERE
app_id = 'APPID'
ORDER BY
day DESC,
app_id,
version;