UNNEST 集合的最后一个值
Last value from UNNEST collection
我遇到了以下查询的问题:
SELECT
project.id as id,
(SELECT value FROM UNNEST(project.labels) WHERE key="key1") as key1,
(SELECT value FROM UNNEST(project.labels) WHERE key="key2") as key2,
ROUND(SUM(cost), 2) as charges
FROM `cloud.billing.data_123`
WHERE project.id is not null and EXTRACT(MONTH FROM usage_start_time) = 6 and EXTRACT(YEAR FROM usage_start_time) = 2020
GROUP BY id, key1, key2
ORDER by id
它正在获取每个项目每月的总浪费量(在上面的示例中,是 2020 年的第 6 个月)。此报告基于导出到 bigquery 的账单报告。结果是这样的:
Row | id | key1 | key2 | charges |
1 |project1 | null | null | 32 |
2 |project1 | x | y | 40 |
3 |project2 | null | null | 50 |
4 |project2 | x | y | 10 |
键是项目标签,发生这种情况是因为标签 key1
和 key2
刚刚在月中添加到项目中。因此,第一条记录(键上的空值)是项目没有标签时的总数,第二条记录(带有 x 和 y)是项目有标签时的总数。
有没有办法用标签将所有内容聚集在一行中并对值求和,例如:
Row | id | key1 | key2 | charges |
1 |project1 | x | y | 72 |
2 |project2 | x | y | 60 |
提前致谢。
据我了解,您想将每个项目的成本和产出相加 id
、key1
、key2
和 cost
,其中 key1 和 key2 不为空。
因此,为了实现这一点,我将提出两种方法,假设每个项目只有一个独特的 key1
和一个独特的 key2
。也就是说,比如project1的key1
为null时,应该为x.
第一种方法:使用FIRST_VALUE()填充key1
和key2
值为空时的值。
WITH data1 AS (
SELECT
project.id as id,
(SELECT value FROM UNNEST(project.labels) WHERE key="key1") as key1,
(SELECT value FROM UNNEST(project.labels) WHERE key="key2") as key2,
cost
FROM `cloud.billing.data_123`
WHERE project.id is not null and EXTRACT(MONTH FROM usage_start_time) = 6 and EXTRACT(YEAR FROM usage_start_time) = 2020
GROUP BY id, project, ar, activity
ORDER by id, project
),
data2 AS(
SELECT id,
FIRST_VALUE(key1 IGNORE NULLS) OVER (PARTITION BY id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS key1,
FIRST_VALUE(key2 IGNORE NULLS) OVER (PARTITION BY id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS key2
cost
)
SELECT id, key1,key2, ROUND(SUM(cost),2) AS charges FROM data2
GROUP BY id, key1,key2
请注意,FIRST_VALUE() 与 IGNORE NULLS 一起使用,它会找到 [= 的下一个可用值14=] 和 key2
在指定的分区内。因此,可以将按 id、key1 和 key2 分组的成本 相加.
第二种方法:使用SELECT DISTINCT and LEFT JOIN
WITH data1 AS (
SELECT
project.id as id,
(SELECT value FROM UNNEST(project.labels) WHERE key="key1") as key1,
(SELECT value FROM UNNEST(project.labels) WHERE key="key2") as key2,
cost
FROM `cloud.billing.data_123`
WHERE project.id is not null and EXTRACT(MONTH FROM usage_start_time) = 6 and EXTRACT(YEAR FROM usage_start_time) = 2020
GROUP BY id, project, ar, activity
ORDER by id, project
),
data2 AS(
SELECT DISTINCT id, key1,key2 FROM data
WHERE key1 IS NOT NULL AND key2 IS NOT NULL
)
SELECT a.id,b.key1,b.key2,ROUND(SUM(cost),2) AS charges FROM data a LEFT JOIN data2 b ON a.id = b.id
GROUP BY 1,2,3
想法与第一种方法相同,替换空值(对于 key1
和 key2
),然后对每个项目的成本求和。
两者的输出,
Row | id | key1 | key2 | charges |
1 |project1 | x | y | 72 |
2 |project2 | x | y | 60 |
我遇到了以下查询的问题:
SELECT
project.id as id,
(SELECT value FROM UNNEST(project.labels) WHERE key="key1") as key1,
(SELECT value FROM UNNEST(project.labels) WHERE key="key2") as key2,
ROUND(SUM(cost), 2) as charges
FROM `cloud.billing.data_123`
WHERE project.id is not null and EXTRACT(MONTH FROM usage_start_time) = 6 and EXTRACT(YEAR FROM usage_start_time) = 2020
GROUP BY id, key1, key2
ORDER by id
它正在获取每个项目每月的总浪费量(在上面的示例中,是 2020 年的第 6 个月)。此报告基于导出到 bigquery 的账单报告。结果是这样的:
Row | id | key1 | key2 | charges |
1 |project1 | null | null | 32 |
2 |project1 | x | y | 40 |
3 |project2 | null | null | 50 |
4 |project2 | x | y | 10 |
键是项目标签,发生这种情况是因为标签 key1
和 key2
刚刚在月中添加到项目中。因此,第一条记录(键上的空值)是项目没有标签时的总数,第二条记录(带有 x 和 y)是项目有标签时的总数。
有没有办法用标签将所有内容聚集在一行中并对值求和,例如:
Row | id | key1 | key2 | charges |
1 |project1 | x | y | 72 |
2 |project2 | x | y | 60 |
提前致谢。
据我了解,您想将每个项目的成本和产出相加 id
、key1
、key2
和 cost
,其中 key1 和 key2 不为空。
因此,为了实现这一点,我将提出两种方法,假设每个项目只有一个独特的 key1
和一个独特的 key2
。也就是说,比如project1的key1
为null时,应该为x.
第一种方法:使用FIRST_VALUE()填充key1
和key2
值为空时的值。
WITH data1 AS (
SELECT
project.id as id,
(SELECT value FROM UNNEST(project.labels) WHERE key="key1") as key1,
(SELECT value FROM UNNEST(project.labels) WHERE key="key2") as key2,
cost
FROM `cloud.billing.data_123`
WHERE project.id is not null and EXTRACT(MONTH FROM usage_start_time) = 6 and EXTRACT(YEAR FROM usage_start_time) = 2020
GROUP BY id, project, ar, activity
ORDER by id, project
),
data2 AS(
SELECT id,
FIRST_VALUE(key1 IGNORE NULLS) OVER (PARTITION BY id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS key1,
FIRST_VALUE(key2 IGNORE NULLS) OVER (PARTITION BY id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS key2
cost
)
SELECT id, key1,key2, ROUND(SUM(cost),2) AS charges FROM data2
GROUP BY id, key1,key2
请注意,FIRST_VALUE() 与 IGNORE NULLS 一起使用,它会找到 [= 的下一个可用值14=] 和 key2
在指定的分区内。因此,可以将按 id、key1 和 key2 分组的成本 相加.
第二种方法:使用SELECT DISTINCT and LEFT JOIN
WITH data1 AS (
SELECT
project.id as id,
(SELECT value FROM UNNEST(project.labels) WHERE key="key1") as key1,
(SELECT value FROM UNNEST(project.labels) WHERE key="key2") as key2,
cost
FROM `cloud.billing.data_123`
WHERE project.id is not null and EXTRACT(MONTH FROM usage_start_time) = 6 and EXTRACT(YEAR FROM usage_start_time) = 2020
GROUP BY id, project, ar, activity
ORDER by id, project
),
data2 AS(
SELECT DISTINCT id, key1,key2 FROM data
WHERE key1 IS NOT NULL AND key2 IS NOT NULL
)
SELECT a.id,b.key1,b.key2,ROUND(SUM(cost),2) AS charges FROM data a LEFT JOIN data2 b ON a.id = b.id
GROUP BY 1,2,3
想法与第一种方法相同,替换空值(对于 key1
和 key2
),然后对每个项目的成本求和。
两者的输出,
Row | id | key1 | key2 | charges |
1 |project1 | x | y | 72 |
2 |project2 | x | y | 60 |