使用 Dataflow [apache beam] 第二次从 Big Query 中提取数据的问题

Issues in Extracting data from Big Query from second time using Dataflow [ apache beam ]


我需要使用数据流从 BigQuery table 中提取数据并写入 GCS 存储桶。
数据流是使用 apache beam (Java) 构建的。数据流首次从BigQuery中提取并完美写入GCS

但是,当第一个管道成功执行后,第二个数据流启动以从同一个 table 中提取数据时,它不会从 Big Query 中提取任何数据。我在 stackdriver 日志中看到的唯一错误是

Blockquote "Request failed with code 409, performed 0 retries due to IOExceptions, performed 0 retries due to unsuccessful status codes, HTTP framework says request can be retried, (caller responsible for retrying): https://www.googleapis.com/bigquery/v2/projects/dataflow-begining/jobs"

我用来提取的示例代码是

 pipeline.apply("Extract from BQ", BigQueryIO.readTableRows().fromQuery("SELECT * from bq_test.employee"))

感谢任何帮助

我以前在使用模板时见过这种情况。根据文档 here,在 Usage with templates 部分:

When using read() or readTableRows() in a template, it's required to specify BigQueryIO.Read.withTemplateCompatibility(). Specifying this in a non-template pipeline is not recommended because it has somewhat lower performance.

并在 withTemplateCompatibility 部分:

Use new template-compatible source implementation. This implementation is compatible with repeated template invocations.

如果是这样,您应该使用:

pipeline.apply("Extract from BQ", BigQueryIO
        .readTableRows()
        .withTemplateCompatibility()
        .fromQuery("SELECT * from bq_test.employee"))