Bigquery select 条非重复记录
Bigquery select non duplicate records
考虑以下 table(简化版):
id int,
amount decimal,
transaction_no,
location_id int,
created_at datetime
以上架构用于存储餐厅的 POS 收据。现在,这个 table 有时包含同一日期的收据,相同的 transaction_no location_id。
在这种情况下,我想做的是获取 location_id & [=34= 的最后一张收据]transaction_no 按 created_at desc 排序。
在 MySQL 中,我使用以下查询获取最后一个(max(created_at)
location_id 收据 & transaction_no:
SELECT id, amount, transaction_no, location_id, created_at
FROM receipts r JOIN
(SELECT transaction_no, max(created_at) AS maxca
FROM receipts r
GROUP BY transaction_no
) t
ON r.transaction_no = t.transaction_no AND r.created_at = t.maxca
group by location_id;
但是当我在 BigQuery 中 运行 相同时,出现以下错误:
Query Failed Error: Shuffle reached broadcast limit for table __I0
(broadcasted at least 150393576 bytes). Consider using partitioned
joins instead of broadcast joins . Job ID:
circular-gist-812:job_A_CfsSKJICuRs07j7LHVbkqcpSg
知道如何在 BigQuery 中使用上述查询吗?
SELECT id, amount, transaction_no, location_id, created_at
FROM (
SELECT
id, amount, transaction_no, location_id, created_at,
ROW_NUMBER() OVER(PARTITION BY transaction_no, location_id
ORDER BY created_at DESC) as last
FROM your_dataset.your_table
)
WHERE last = 1
考虑以下 table(简化版):
id int,
amount decimal,
transaction_no,
location_id int,
created_at datetime
以上架构用于存储餐厅的 POS 收据。现在,这个 table 有时包含同一日期的收据,相同的 transaction_no location_id。
在这种情况下,我想做的是获取 location_id & [=34= 的最后一张收据]transaction_no 按 created_at desc 排序。
在 MySQL 中,我使用以下查询获取最后一个(max(created_at)
location_id 收据 & transaction_no:
SELECT id, amount, transaction_no, location_id, created_at
FROM receipts r JOIN
(SELECT transaction_no, max(created_at) AS maxca
FROM receipts r
GROUP BY transaction_no
) t
ON r.transaction_no = t.transaction_no AND r.created_at = t.maxca
group by location_id;
但是当我在 BigQuery 中 运行 相同时,出现以下错误:
Query Failed Error: Shuffle reached broadcast limit for table __I0 (broadcasted at least 150393576 bytes). Consider using partitioned joins instead of broadcast joins . Job ID: circular-gist-812:job_A_CfsSKJICuRs07j7LHVbkqcpSg
知道如何在 BigQuery 中使用上述查询吗?
SELECT id, amount, transaction_no, location_id, created_at
FROM (
SELECT
id, amount, transaction_no, location_id, created_at,
ROW_NUMBER() OVER(PARTITION BY transaction_no, location_id
ORDER BY created_at DESC) as last
FROM your_dataset.your_table
)
WHERE last = 1