Select NESTED 列中的多个键和值
Select multiple keys and values from a NESTED column
我正在将 Google Analytics 4 事件链接到 BigQuery table。
我可以仅基于一个键检索数据,但是如何获取存储在与该键相同的记录中的另一个键中的值?
具体来说,我想按文章名称对浏览量进行排名,然后在单独的一栏中提供作者姓名作为补充数据(作者的名称存储在同一记录中,但嵌套列中具有不同的键)。
环境
Google 分析 4
在 Google 跟踪代码管理器
中设置事件
table 架构如下所示,在 event_params.key
中包含 article_name
author_name
等键,在 event_params.value.string_value
中包含您想要获取的值。
table 预览如下所示:
+-----+------------+-----------------+--------------+------------------+---------------------------------+
| Row | event_date | event_timestamp | event_name | event_params.key | event_params.value.string_value |
+-----+------------+-----------------+--------------+------------------+---------------------------------+
| 1 | 20201127 | 160394324324231 | view_article | article_name | My Article A |
| | | | | author_name | Author A |
| | | | | random key1 | random value1 |
| | | | | random key2 | random value2 |
| 2 | 20201127 | 160394324324112 | view_article | article_name | My Article B |
| | | | | author_name | Author B |
| | | | | random key1 | random value3 |
| | | | | random key2 | random value4 |
...
+-----+------------+-----------------+--------------+------------------+---------------------------------+
我试过的
能够在没有作者姓名的情况下自行获得文章排名。
#standardSQL
WITH _data AS (
SELECT
value.string_value AS article_name
FROM
`my-new-project.analytics_000000000.events_*`,
UNNEST(event_params)
WHERE
event_name = 'article_view'
)
SELECT
article_name,
COUNT(*) AS cnt
FROM
_data
GROUP BY
1
ORDER BY
2 DESC
结果:
+-----+--------------+-----+
| Row | article_name | cnt |
+-----+--------------+-----+
| 1 | My Article A | 20 |
| 2 | My Article D | 18 |
| 3 | My Article C | 11 |
| 4 | My Article B | 9 |
...
+-----+--------------+-----+
我想在此处的 article_name
旁边为 author_name
添加一列,因此我认为使用 CASE WHEN 是个好主意。
但事实证明,author_name
将全部为空,这可能意味着它被视为单独的记录。
#standardSQL
WITH _data AS (
SELECT
CASE WHEN key = 'article_name' THEN value.string_value
END AS article_name,
CASE WHEN key = 'author_name' THEN value.string_value
END AS author_name
FROM
`my-new-project.analytics_000000000.events_*`,
UNNEST(event_params)
WHERE
key = 'article_name'
)
SELECT
article_name,
MAX(author_name),
COUNT(*) AS cnt
FROM
_data
GROUP BY
1
ORDER BY
3 DESC
结果:
+-----+--------------+-------------+-----+
| Row | article_name | author_name | cnt |
+-----+--------------+-------------+-----+
| 1 | My Article A | null | 20 |
| 2 | My Article D | null | 18 |
| 3 | My Article C | null | 11 |
| 4 | My Article B | null | 9 |
...
+-----+--------------+-------------+-----+
当我用author_name
按降序GROUP BY时,作者的名字正确出现了,但是这次article_name
全是null。是否可以在同一条记录中同时出现 article_name 和 author_name,并且同一排名结果中的文章名称旁边有作者姓名?
好像你在过滤掉 'article_authors ,你如何将 cte 中的 where 子句更改为 WHERE key IN ( 'article_name' , 'author_name')
因此您的查询应如下所示:
#standardSQL
WITH _data AS (
SELECT
CASE WHEN key = 'article_name' THEN value.string_value
END AS article_name,
CASE WHEN key = 'author_name' THEN value.string_value
END AS author_name
FROM
`my-new-project.analytics_000000000.events_*`,
UNNEST(event_params)
WHERE
key IN ( 'article_name' , 'author_name')
)
SELECT
article_name,
MAX(author_name),
COUNT(*) AS cnt
FROM
_data
GROUP BY
1
ORDER BY
3 DESC
我认为您正在寻找子select 解决方案:
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'article_name') AS article_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'author_name') AS author_name,
count(1) as cnt
FROM
`my-new-project.analytics_000000000.events_*`
GROUP BY 1,2
ORDER BY 3 DESC
我正在将 Google Analytics 4 事件链接到 BigQuery table。 我可以仅基于一个键检索数据,但是如何获取存储在与该键相同的记录中的另一个键中的值?
具体来说,我想按文章名称对浏览量进行排名,然后在单独的一栏中提供作者姓名作为补充数据(作者的名称存储在同一记录中,但嵌套列中具有不同的键)。
环境
Google 分析 4 在 Google 跟踪代码管理器
中设置事件table 架构如下所示,在 event_params.key
中包含 article_name
author_name
等键,在 event_params.value.string_value
中包含您想要获取的值。
table 预览如下所示:
+-----+------------+-----------------+--------------+------------------+---------------------------------+
| Row | event_date | event_timestamp | event_name | event_params.key | event_params.value.string_value |
+-----+------------+-----------------+--------------+------------------+---------------------------------+
| 1 | 20201127 | 160394324324231 | view_article | article_name | My Article A |
| | | | | author_name | Author A |
| | | | | random key1 | random value1 |
| | | | | random key2 | random value2 |
| 2 | 20201127 | 160394324324112 | view_article | article_name | My Article B |
| | | | | author_name | Author B |
| | | | | random key1 | random value3 |
| | | | | random key2 | random value4 |
...
+-----+------------+-----------------+--------------+------------------+---------------------------------+
我试过的
能够在没有作者姓名的情况下自行获得文章排名。
#standardSQL
WITH _data AS (
SELECT
value.string_value AS article_name
FROM
`my-new-project.analytics_000000000.events_*`,
UNNEST(event_params)
WHERE
event_name = 'article_view'
)
SELECT
article_name,
COUNT(*) AS cnt
FROM
_data
GROUP BY
1
ORDER BY
2 DESC
结果:
+-----+--------------+-----+
| Row | article_name | cnt |
+-----+--------------+-----+
| 1 | My Article A | 20 |
| 2 | My Article D | 18 |
| 3 | My Article C | 11 |
| 4 | My Article B | 9 |
...
+-----+--------------+-----+
我想在此处的 article_name
旁边为 author_name
添加一列,因此我认为使用 CASE WHEN 是个好主意。
但事实证明,author_name
将全部为空,这可能意味着它被视为单独的记录。
#standardSQL
WITH _data AS (
SELECT
CASE WHEN key = 'article_name' THEN value.string_value
END AS article_name,
CASE WHEN key = 'author_name' THEN value.string_value
END AS author_name
FROM
`my-new-project.analytics_000000000.events_*`,
UNNEST(event_params)
WHERE
key = 'article_name'
)
SELECT
article_name,
MAX(author_name),
COUNT(*) AS cnt
FROM
_data
GROUP BY
1
ORDER BY
3 DESC
结果:
+-----+--------------+-------------+-----+
| Row | article_name | author_name | cnt |
+-----+--------------+-------------+-----+
| 1 | My Article A | null | 20 |
| 2 | My Article D | null | 18 |
| 3 | My Article C | null | 11 |
| 4 | My Article B | null | 9 |
...
+-----+--------------+-------------+-----+
当我用author_name
按降序GROUP BY时,作者的名字正确出现了,但是这次article_name
全是null。是否可以在同一条记录中同时出现 article_name 和 author_name,并且同一排名结果中的文章名称旁边有作者姓名?
好像你在过滤掉 'article_authors ,你如何将 cte 中的 where 子句更改为 WHERE key IN ( 'article_name' , 'author_name')
因此您的查询应如下所示:
#standardSQL
WITH _data AS (
SELECT
CASE WHEN key = 'article_name' THEN value.string_value
END AS article_name,
CASE WHEN key = 'author_name' THEN value.string_value
END AS author_name
FROM
`my-new-project.analytics_000000000.events_*`,
UNNEST(event_params)
WHERE
key IN ( 'article_name' , 'author_name')
)
SELECT
article_name,
MAX(author_name),
COUNT(*) AS cnt
FROM
_data
GROUP BY
1
ORDER BY
3 DESC
我认为您正在寻找子select 解决方案:
SELECT
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'article_name') AS article_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'author_name') AS author_name,
count(1) as cnt
FROM
`my-new-project.analytics_000000000.events_*`
GROUP BY 1,2
ORDER BY 3 DESC