BigQuery 视图的神秘之处

mystery on BigQuery views

这是我的谜。在控制台上,当我计算这个查询时,它运行得非常好:

SELECT rd.ds_id AS ds_id
FROM (SELECT ds_id, 1 AS dummy FROM bq_000010.table) rd
  INNER JOIN EACH (SELECT 1 AS dummy) cal ON (cal.dummy = rd.dummy);

然后我将它保存为一个名为 dataset.myview 的视图,并且 运行:

SELECT * FROM dataset.myview LIMIT 1000

但这会引发以下错误:

SELECT query which references non constant fields or uses aggregation functions or has one or more of WHERE, OMIT IF, GROUP BY, ORDER BY clauses must have FROM clause.

然而,当我尝试:SELECT * FROM dataset.myview,即没有 LIMIT,它起作用了!!

事实上,当我 运行 我在底部使用 LIMIT 进行完整查询时,它也会引发错误:

SELECT rd.ds_id AS ds_id
FROM (SELECT ds_id, 1 AS dummy FROM bq_000010.table) rd
  INNER JOIN EACH (SELECT 1 AS dummy) cal ON (cal.dummy = rd.dummy) LIMIT 1000;

然而,当我添加内部 ORDER BY 时,它再次计算良好:

SELECT rd.ds_id AS ds_id
FROM (SELECT ds_id,
             1 AS dummy
      FROM bq_000010.000010_flux_visites_ds
      ORDER BY ds_id) rd
  INNER JOIN EACH (SELECT 1 AS dummy) cal ON (cal.dummy = rd.dummy) LIMIT 1000

如果您对视图中的 select 应用 order by 会怎样?还是您需要随机结果?

A query with a LIMIT clause may still be non-deterministic if there is no operator in the query that guarantees the ordering of the output result set. This is because BigQuery executes using a large number of parallel workers. The order in which parallel jobs return is not guaranteed.

我不确定这里的顺序为什么会有所不同。但是,看到没有任何顺序的限制通常很奇怪;这就是为什么我询问订单。一个完整的 SWAG 可能是并行工作人员在内部 select 完成之前完成外部连接和限制,从而导致内部错误;并通过系统应用命令强制在执行内部连接之前具体化记录。

但我真的~~没有线索~