如何在 Bigquery 存储过程中添加过滤器 sql `WHERE` CLAUSE?

How to add filter sql `WHERE` CLAUSE in Bigquery stored procedure?

Bigquery 中的

Table 按时间戳列分区,我编写了一个运行良好的简单程序。

这里是:

CREATE PROCEDURE DATASET_ID.TEST(from_timestamp  timestamp, to_timestamp  timestamp)
BEGIN
  SELECT
      *
  FROM
     DATASET_ID.TABLE_ID
  WHERE
      timestamp>=from_timestamp and timestamp<=to_timestamp;
END;

现在,当我添加更多过滤器时,Bigquery 会抛出错误。

具有更多过滤器的程序:

CREATE PROCEDURE DATASET_ID.TEST(from_timestamp  timestamp, to_timestamp  timestamp)
BEGIN
  SELECT
      *
  FROM
     DATASET_ID.TABLE_ID
  WHERE
      timestamp>=from_timestamp and timestamp<=to_timestamp
      and app_id="xyz";
END;
Error validating procedure body (add OPTIONS(strict_mode=false) to suppress): Query error: Query error: Cannot query over table ' DATASET_ID.TABLE_ID' without a filter over column(s) 'timestamp' that can be used for partition elimination at [3:3]

向存储过程中的 where 子句添加更多过滤器的最佳方法是什么?

根据 documentation,当你有一个分区时 table 你需要指定你将查询的分区。

除此之外,CREATE PROCEDURE 还有一个可选标志 [strict_mode][2],即:

  1. 如果设置为TRUE:

The procedure body will undergo additional checks for errors such as non-existent tables or columns. The CREATE PROCEDURE statement will fail if the body fails any of these checks.

  1. 如果设置为FALSE:

The procedure body is checked only for syntax. Procedures which invoke themselves recursively should be created with strict_mode=FALSE to avoid errors caused by the procedure not yet existing while it is being validated

默认设置为TRUE.

我能够使用时间戳分区 table 复制您的案例,成功创建过程并在 WHERE 子句中添加更多过滤器。下面是我用的table

Row _time                   dummy_column
1   2020-06-15 23:57:00 UTC a
2   2020-06-15 23:58:00 UTC b
3   2020-06-15 23:59:00 UTC c
4   2020-06-16 00:00:00 UTC d
5   2020-06-16 00:00:01 UTC e
6   2020-06-16 00:00:02 UTC f

table 被字段 _time 分割,这是一个 TIMESTAMP.

为了创建时间为 运行ge 的过程,我使用了 BETWEEN 运算符。然后在存储之后,我添加了一个额外的过滤器dummy_column="d"。最终程序如下:

CREATE OR REPLACE PROCEDURE `project_id.dataset.procedure`(from_ts TIMESTAMP, to_ts TIMESTAMP)
BEGIN
  select *
  from `project_id.dataset.partitioned_table` 
  where _time BETWEEN from_ts and to_ts and dummy_column="d";
END;

请注意,我在 WHERE 子句 中使用了两个过滤器。之后调用程序如下:

DECLARE from_ts TIMESTAMP DEFAULT TIMESTAMP("2008-12-25 05:30:00+00");
DECLARE to_ts TIMESTAMP DEFAULT TIMESTAMP("2020-12-25 05:30:00+00");
CALL `test-proj-261014.sample.test`(from_ts, to_ts);

和输出,

Row _time                   dummy_column    
1   2020-06-16 00:00:00 UTC d   

如上所示,运行 成功,而 strict_mode=TRUE(默认)。虽然,当将它设置为 FALSE 时,它会产生相同的输出而没有任何错误。语法如下:

CREATE OR REPLACE PROCEDURE `project_id.dataset.procedure`(from_ts TIMESTAMP, to_ts TIMESTAMP)
OPTIONS(strict_mode=FALSE)
BEGIN
  select *
  from `project_id.dataset.partitioned_table` 
  where _time BETWEEN from_ts and to_ts and dummy_column="d";
END;

因此,如果您按照上述说明进行操作,您应该不会发现任何错误。