UNNEST 集合的最后一个值

Last value from UNNEST collection

我遇到了以下查询的问题:

SELECT
   project.id as id,
   (SELECT value FROM UNNEST(project.labels) WHERE key="key1") as key1,
   (SELECT value FROM UNNEST(project.labels) WHERE key="key2") as key2,
   ROUND(SUM(cost), 2) as charges
FROM `cloud.billing.data_123`
WHERE project.id is not null and EXTRACT(MONTH FROM usage_start_time) = 6 and EXTRACT(YEAR FROM usage_start_time) = 2020
GROUP BY id, key1, key2
ORDER by id

它正在获取每个项目每月的总浪费量(在上面的示例中,是 2020 年的第 6 个月)。此报告基于导出到 bigquery 的账单报告。结果是这样的:

Row | id       | key1 | key2 | charges |
1   |project1  | null | null | 32      | 
2   |project1  | x    | y    | 40      |
3   |project2  | null | null | 50      | 
4   |project2  | x    | y    | 10      |

键是项目标签,发生这种情况是因为标签 key1key2 刚刚在月中添加到项目中。因此,第一条记录(键上的空值)是项目没有标签时的总数,第二条记录(带有 x 和 y)是项目有标签时的总数。

有没有办法用标签将所有内容聚集在一行中并对值求和,例如:

Row | id       | key1 | key2 | charges |
1   |project1  | x    | y    | 72      |
2   |project2  | x    | y    | 60      |

提前致谢。

据我了解,您想将每个项目的成本和产出相加 idkey1key2cost,其中 key1key2 不为空。

因此,为了实现这一点,我将提出两种方法,假设每个项目只有一个独特的 key1 和一个独特的 key2。也就是说,比如project1的key1null时,应该x.

第一种方法:使用FIRST_VALUE()填充key1key2值为空时的值。

WITH data1 AS (
SELECT
   project.id as id,
   (SELECT value FROM UNNEST(project.labels) WHERE key="key1") as key1,
   (SELECT value FROM UNNEST(project.labels) WHERE key="key2") as key2,
   cost
FROM `cloud.billing.data_123`
WHERE project.id is not null and EXTRACT(MONTH FROM usage_start_time) = 6 and EXTRACT(YEAR FROM usage_start_time) = 2020
GROUP BY id, project, ar, activity
ORDER by id, project
),
data2 AS(
SELECT id, 
FIRST_VALUE(key1 IGNORE NULLS) OVER (PARTITION BY id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS key1,
FIRST_VALUE(key2 IGNORE NULLS) OVER (PARTITION BY id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS key2
cost
)
SELECT id, key1,key2, ROUND(SUM(cost),2) AS charges FROM data2
GROUP BY id, key1,key2

请注意,FIRST_VALUE() 与 IGNORE NULLS 一起使用,它会找到 [= 的下一个可用值14=] 和 key2 在指定的分区内。因此,可以将按 id、key1 和 key2 分组的成本 相加.

第二种方法:使用SELECT DISTINCT and LEFT JOIN

WITH data1 AS (
    SELECT
       project.id as id,
       (SELECT value FROM UNNEST(project.labels) WHERE key="key1") as key1,
       (SELECT value FROM UNNEST(project.labels) WHERE key="key2") as key2,
       cost
    FROM `cloud.billing.data_123`
    WHERE project.id is not null and EXTRACT(MONTH FROM usage_start_time) = 6 and EXTRACT(YEAR FROM usage_start_time) = 2020
    GROUP BY id, project, ar, activity
    ORDER by id, project
    ),
    data2 AS(
    SELECT DISTINCT id, key1,key2 FROM data
    WHERE key1 IS NOT NULL AND key2 IS NOT NULL
    )
    SELECT a.id,b.key1,b.key2,ROUND(SUM(cost),2) AS charges FROM data a LEFT JOIN data2 b ON a.id = b.id
    GROUP BY 1,2,3

想法与第一种方法相同,替换空值(对于 key1key2),然后对每个项目的成本求和。

两者的输出,

Row | id       | key1 | key2 | charges |
1   |project1  | x    | y    | 72      |
2   |project2  | x    | y    | 60      |