BigQuery - Select 多列并希望排除两个双嵌套列
BigQuery - Select multiple columns and want to exclude two double nested columns
您好,我正在处理 lt-pcf-analytics-exp.90676036.ga_sessions_*
table,我需要提取不同的变量,包括嵌套命中列中的所有变量,hits.customDimensions.value
和 [=14= 列除外].我认为 hits 和 hits.customDimensions
都是 ARRAY。我如何在标准 SQL 中执行此操作?
我发现了一个关于类似问题的问题 (),但在我的例子中,我有一个双嵌套数组列,我无法调整代码。
基本上,这就是我要提取的内容。如何修改它以便排除 hits.customDimensions.value
和 hits.customDimensions.index
?谢谢。
SELECT fullVisitorId,
visitId,
visitNumber,
cd.value as PCF_CUST_ID,
date,
TIMESTAMP_SECONDS(visitStartTime) as visitStartTime,
totals.visits as visits,
totals.hits as total_hits,
hits.* (EXCEPT hits.customDimensions.value and hits.customDimensions.index)
FROM `lt-pcf-analytics-exp.90676036.ga_sessions_*` as t
left join unnest(customDimensions) as cd
left join unnest(hits) as hits
WHERE _TABLE_SUFFIX between '20210101' and '20210131'
and cd.index = 4 and cd.value is not null
ORDER BY PCF_CUST_ID, visitStartTime, hitNumber
如果你检查 BigQuery documentation for except,这不是好的语法:
SELECT [ AS { typename | STRUCT | VALUE } ] [{ ALL | DISTINCT }]
{ [ expression. ]* [ EXCEPT ( column_name [, ...] ) ]
[ REPLACE ( expression [ AS ] column_name [, ...] ) ]
| expression [ [ AS ] alias ] } [, ...]
所以,像这样使用它:
SELECT hits.* EXCEPT (value, index)
就像@martinus 注意到的那样,您的 except 语法不正确。如果您看一下 BigQuery Documentation,您会发现 运行 带有例外的查询的正确方法是:
SELECT
field.* EXCEPT (nested_field1, nested_field2)
FROM `my_table`
但是,您不能直接在嵌套字段上直接使用 EXCEPT
。作为解决方法,您可以从 hits.*
中排除所有 hits.customDimensions
值,然后仅 hits.customDimensions.*
排除 SELECT
,然后排除您需要删除的嵌套元素,例如 index
和 value
.
像下面这样的查询应该有效:
SELECT fullVisitorId,
visitId,
visitNumber,
cd.value as PCF_CUST_ID,
date,
TIMESTAMP_SECONDS(visitStartTime) as visitStartTime,
totals.visits as visits,
totals.hits as total_hits,
hits.* EXCEPT (hits.customDimensions),
hits.customDimensions.* EXCEPT (index, value)
FROM `lt-pcf-analytics-exp.90676036.ga_sessions_*` as t
left join unnest(customDimensions) as cd
left join unnest(hits) as hits
WHERE _TABLE_SUFFIX between '20210101' and '20210131'
and cd.index = 4 and cd.value is not null
ORDER BY PCF_CUST_ID, visitStartTime, hitNumber
您好,我正在处理 lt-pcf-analytics-exp.90676036.ga_sessions_*
table,我需要提取不同的变量,包括嵌套命中列中的所有变量,hits.customDimensions.value
和 [=14= 列除外].我认为 hits 和 hits.customDimensions
都是 ARRAY。我如何在标准 SQL 中执行此操作?
我发现了一个关于类似问题的问题 (
基本上,这就是我要提取的内容。如何修改它以便排除 hits.customDimensions.value
和 hits.customDimensions.index
?谢谢。
SELECT fullVisitorId,
visitId,
visitNumber,
cd.value as PCF_CUST_ID,
date,
TIMESTAMP_SECONDS(visitStartTime) as visitStartTime,
totals.visits as visits,
totals.hits as total_hits,
hits.* (EXCEPT hits.customDimensions.value and hits.customDimensions.index)
FROM `lt-pcf-analytics-exp.90676036.ga_sessions_*` as t
left join unnest(customDimensions) as cd
left join unnest(hits) as hits
WHERE _TABLE_SUFFIX between '20210101' and '20210131'
and cd.index = 4 and cd.value is not null
ORDER BY PCF_CUST_ID, visitStartTime, hitNumber
如果你检查 BigQuery documentation for except,这不是好的语法:
SELECT [ AS { typename | STRUCT | VALUE } ] [{ ALL | DISTINCT }]
{ [ expression. ]* [ EXCEPT ( column_name [, ...] ) ]
[ REPLACE ( expression [ AS ] column_name [, ...] ) ]
| expression [ [ AS ] alias ] } [, ...]
所以,像这样使用它:
SELECT hits.* EXCEPT (value, index)
就像@martinus 注意到的那样,您的 except 语法不正确。如果您看一下 BigQuery Documentation,您会发现 运行 带有例外的查询的正确方法是:
SELECT
field.* EXCEPT (nested_field1, nested_field2)
FROM `my_table`
但是,您不能直接在嵌套字段上直接使用 EXCEPT
。作为解决方法,您可以从 hits.*
中排除所有 hits.customDimensions
值,然后仅 hits.customDimensions.*
排除 SELECT
,然后排除您需要删除的嵌套元素,例如 index
和 value
.
像下面这样的查询应该有效:
SELECT fullVisitorId,
visitId,
visitNumber,
cd.value as PCF_CUST_ID,
date,
TIMESTAMP_SECONDS(visitStartTime) as visitStartTime,
totals.visits as visits,
totals.hits as total_hits,
hits.* EXCEPT (hits.customDimensions),
hits.customDimensions.* EXCEPT (index, value)
FROM `lt-pcf-analytics-exp.90676036.ga_sessions_*` as t
left join unnest(customDimensions) as cd
left join unnest(hits) as hits
WHERE _TABLE_SUFFIX between '20210101' and '20210131'
and cd.index = 4 and cd.value is not null
ORDER BY PCF_CUST_ID, visitStartTime, hitNumber