如何在 Bigquery 存储过程中添加过滤器 sql `WHERE` CLAUSE？

Question

Bigquery 中的

Table 按时间戳列分区，我编写了一个运行良好的简单程序。

这里是：

CREATE PROCEDURE DATASET_ID.TEST(from_timestamp  timestamp, to_timestamp  timestamp)
BEGIN
  SELECT
      *
  FROM
     DATASET_ID.TABLE_ID
  WHERE
      timestamp>=from_timestamp and timestamp<=to_timestamp;
END;

现在，当我添加更多过滤器时，Bigquery 会抛出错误。

具有更多过滤器的程序：

CREATE PROCEDURE DATASET_ID.TEST(from_timestamp  timestamp, to_timestamp  timestamp)
BEGIN
  SELECT
      *
  FROM
     DATASET_ID.TABLE_ID
  WHERE
      timestamp>=from_timestamp and timestamp<=to_timestamp
      and app_id="xyz";
END;

Error validating procedure body (add OPTIONS(strict_mode=false) to suppress): Query error: Query error: Cannot query over table ' DATASET_ID.TABLE_ID' without a filter over column(s) 'timestamp' that can be used for partition elimination at [3:3]

向存储过程中的 where 子句添加更多过滤器的最佳方法是什么？

Answer 1

根据 documentation，当你有一个分区时 table 你需要指定你将查询的分区。

除此之外，CREATE PROCEDURE 还有一个可选标志 [strict_mode][2]，即：

如果设置为TRUE:

The procedure body will undergo additional checks for errors such as non-existent tables or columns. The CREATE PROCEDURE statement will fail if the body fails any of these checks.

如果设置为FALSE:

The procedure body is checked only for syntax. Procedures which invoke themselves recursively should be created with strict_mode=FALSE to avoid errors caused by the procedure not yet existing while it is being validated

默认设置为TRUE.

我能够使用时间戳分区 table 复制您的案例，成功创建过程并在 WHERE 子句中添加更多过滤器。下面是我用的table

Row _time                   dummy_column
1   2020-06-15 23:57:00 UTC a
2   2020-06-15 23:58:00 UTC b
3   2020-06-15 23:59:00 UTC c
4   2020-06-16 00:00:00 UTC d
5   2020-06-16 00:00:01 UTC e
6   2020-06-16 00:00:02 UTC f

table 被字段 _time 分割，这是一个 TIMESTAMP.

为了创建时间为运行ge 的过程，我使用了 BETWEEN 运算符。然后在存储之后，我添加了一个额外的过滤器dummy_column="d"。最终程序如下：

CREATE OR REPLACE PROCEDURE `project_id.dataset.procedure`(from_ts TIMESTAMP, to_ts TIMESTAMP)
BEGIN
  select *
  from `project_id.dataset.partitioned_table` 
  where _time BETWEEN from_ts and to_ts and dummy_column="d";
END;

请注意，我在 WHERE 子句 中使用了两个过滤器。之后调用程序如下：

DECLARE from_ts TIMESTAMP DEFAULT TIMESTAMP("2008-12-25 05:30:00+00");
DECLARE to_ts TIMESTAMP DEFAULT TIMESTAMP("2020-12-25 05:30:00+00");
CALL `test-proj-261014.sample.test`(from_ts, to_ts);

和输出，

Row _time                   dummy_column    
1   2020-06-16 00:00:00 UTC d

如上所示，运行成功，而 strict_mode=TRUE（默认）。虽然，当将它设置为 FALSE 时，它会产生相同的输出而没有任何错误。语法如下：

CREATE OR REPLACE PROCEDURE `project_id.dataset.procedure`(from_ts TIMESTAMP, to_ts TIMESTAMP)
OPTIONS(strict_mode=FALSE)
BEGIN
  select *
  from `project_id.dataset.partitioned_table` 
  where _time BETWEEN from_ts and to_ts and dummy_column="d";
END;

因此，如果您按照上述说明进行操作，您应该不会发现任何错误。

如何在 Bigquery 存储过程中添加过滤器 sql `WHERE` CLAUSE？

How to add filter sql `WHERE` CLAUSE in Bigquery stored procedure?

sql

stored-procedures

google-bigquery

现在，当我添加更多过滤器时，Bigquery 会抛出错误。

具有更多过滤器的程序：