如何在 Bigquery 存储过程中添加过滤器 sql `WHERE` CLAUSE?
How to add filter sql `WHERE` CLAUSE in Bigquery stored procedure?
Bigquery 中的 Table 按时间戳列分区,我编写了一个运行良好的简单程序。
这里是:
CREATE PROCEDURE DATASET_ID.TEST(from_timestamp timestamp, to_timestamp timestamp)
BEGIN
SELECT
*
FROM
DATASET_ID.TABLE_ID
WHERE
timestamp>=from_timestamp and timestamp<=to_timestamp;
END;
现在,当我添加更多过滤器时,Bigquery 会抛出错误。
具有更多过滤器的程序:
CREATE PROCEDURE DATASET_ID.TEST(from_timestamp timestamp, to_timestamp timestamp)
BEGIN
SELECT
*
FROM
DATASET_ID.TABLE_ID
WHERE
timestamp>=from_timestamp and timestamp<=to_timestamp
and app_id="xyz";
END;
Error validating procedure body (add OPTIONS(strict_mode=false) to suppress): Query error: Query error: Cannot query over table ' DATASET_ID.TABLE_ID' without a filter over column(s) 'timestamp' that can be used for partition elimination at [3:3]
向存储过程中的 where 子句添加更多过滤器的最佳方法是什么?
根据 documentation,当你有一个分区时 table 你需要指定你将查询的分区。
除此之外,CREATE PROCEDURE
还有一个可选标志 [strict_mode][2]
,即:
- 如果设置为
TRUE
:
The procedure body will undergo additional checks for errors such as
non-existent tables or columns. The CREATE PROCEDURE statement will
fail if the body fails any of these checks.
- 如果设置为
FALSE
:
The procedure body is checked only for syntax. Procedures which invoke
themselves recursively should be created with strict_mode=FALSE to
avoid errors caused by the procedure not yet existing while it is
being validated
默认设置为TRUE
.
我能够使用时间戳分区 table 复制您的案例,成功创建过程并在 WHERE
子句中添加更多过滤器。下面是我用的table
Row _time dummy_column
1 2020-06-15 23:57:00 UTC a
2 2020-06-15 23:58:00 UTC b
3 2020-06-15 23:59:00 UTC c
4 2020-06-16 00:00:00 UTC d
5 2020-06-16 00:00:01 UTC e
6 2020-06-16 00:00:02 UTC f
table 被字段 _time
分割,这是一个 TIMESTAMP
.
为了创建时间为 运行ge 的过程,我使用了 BETWEEN 运算符。然后在存储之后,我添加了一个额外的过滤器dummy_column="d"
。最终程序如下:
CREATE OR REPLACE PROCEDURE `project_id.dataset.procedure`(from_ts TIMESTAMP, to_ts TIMESTAMP)
BEGIN
select *
from `project_id.dataset.partitioned_table`
where _time BETWEEN from_ts and to_ts and dummy_column="d";
END;
请注意,我在 WHERE 子句 中使用了两个过滤器。之后调用程序如下:
DECLARE from_ts TIMESTAMP DEFAULT TIMESTAMP("2008-12-25 05:30:00+00");
DECLARE to_ts TIMESTAMP DEFAULT TIMESTAMP("2020-12-25 05:30:00+00");
CALL `test-proj-261014.sample.test`(from_ts, to_ts);
和输出,
Row _time dummy_column
1 2020-06-16 00:00:00 UTC d
如上所示,运行 成功,而 strict_mode=TRUE
(默认)。虽然,当将它设置为 FALSE
时,它会产生相同的输出而没有任何错误。语法如下:
CREATE OR REPLACE PROCEDURE `project_id.dataset.procedure`(from_ts TIMESTAMP, to_ts TIMESTAMP)
OPTIONS(strict_mode=FALSE)
BEGIN
select *
from `project_id.dataset.partitioned_table`
where _time BETWEEN from_ts and to_ts and dummy_column="d";
END;
因此,如果您按照上述说明进行操作,您应该不会发现任何错误。
Table 按时间戳列分区,我编写了一个运行良好的简单程序。
这里是:
CREATE PROCEDURE DATASET_ID.TEST(from_timestamp timestamp, to_timestamp timestamp)
BEGIN
SELECT
*
FROM
DATASET_ID.TABLE_ID
WHERE
timestamp>=from_timestamp and timestamp<=to_timestamp;
END;
现在,当我添加更多过滤器时,Bigquery 会抛出错误。
具有更多过滤器的程序:
CREATE PROCEDURE DATASET_ID.TEST(from_timestamp timestamp, to_timestamp timestamp)
BEGIN
SELECT
*
FROM
DATASET_ID.TABLE_ID
WHERE
timestamp>=from_timestamp and timestamp<=to_timestamp
and app_id="xyz";
END;
Error validating procedure body (add OPTIONS(strict_mode=false) to suppress): Query error: Query error: Cannot query over table ' DATASET_ID.TABLE_ID' without a filter over column(s) 'timestamp' that can be used for partition elimination at [3:3]
向存储过程中的 where 子句添加更多过滤器的最佳方法是什么?
根据 documentation,当你有一个分区时 table 你需要指定你将查询的分区。
除此之外,CREATE PROCEDURE
还有一个可选标志 [strict_mode][2]
,即:
- 如果设置为
TRUE
:
The procedure body will undergo additional checks for errors such as non-existent tables or columns. The CREATE PROCEDURE statement will fail if the body fails any of these checks.
- 如果设置为
FALSE
:
The procedure body is checked only for syntax. Procedures which invoke themselves recursively should be created with strict_mode=FALSE to avoid errors caused by the procedure not yet existing while it is being validated
默认设置为TRUE
.
我能够使用时间戳分区 table 复制您的案例,成功创建过程并在 WHERE
子句中添加更多过滤器。下面是我用的table
Row _time dummy_column
1 2020-06-15 23:57:00 UTC a
2 2020-06-15 23:58:00 UTC b
3 2020-06-15 23:59:00 UTC c
4 2020-06-16 00:00:00 UTC d
5 2020-06-16 00:00:01 UTC e
6 2020-06-16 00:00:02 UTC f
table 被字段 _time
分割,这是一个 TIMESTAMP
.
为了创建时间为 运行ge 的过程,我使用了 BETWEEN 运算符。然后在存储之后,我添加了一个额外的过滤器dummy_column="d"
。最终程序如下:
CREATE OR REPLACE PROCEDURE `project_id.dataset.procedure`(from_ts TIMESTAMP, to_ts TIMESTAMP)
BEGIN
select *
from `project_id.dataset.partitioned_table`
where _time BETWEEN from_ts and to_ts and dummy_column="d";
END;
请注意,我在 WHERE 子句 中使用了两个过滤器。之后调用程序如下:
DECLARE from_ts TIMESTAMP DEFAULT TIMESTAMP("2008-12-25 05:30:00+00");
DECLARE to_ts TIMESTAMP DEFAULT TIMESTAMP("2020-12-25 05:30:00+00");
CALL `test-proj-261014.sample.test`(from_ts, to_ts);
和输出,
Row _time dummy_column
1 2020-06-16 00:00:00 UTC d
如上所示,运行 成功,而 strict_mode=TRUE
(默认)。虽然,当将它设置为 FALSE
时,它会产生相同的输出而没有任何错误。语法如下:
CREATE OR REPLACE PROCEDURE `project_id.dataset.procedure`(from_ts TIMESTAMP, to_ts TIMESTAMP)
OPTIONS(strict_mode=FALSE)
BEGIN
select *
from `project_id.dataset.partitioned_table`
where _time BETWEEN from_ts and to_ts and dummy_column="d";
END;
因此,如果您按照上述说明进行操作,您应该不会发现任何错误。