使用 Dataflow [apache beam] 第二次从 Big Query 中提取数据的问题
Issues in Extracting data from Big Query from second time using Dataflow [ apache beam ]
我需要使用数据流从 BigQuery table 中提取数据并写入 GCS 存储桶。
数据流是使用 apache beam (Java) 构建的。数据流首次从BigQuery中提取并完美写入GCS
但是,当第一个管道成功执行后,第二个数据流启动以从同一个 table 中提取数据时,它不会从 Big Query 中提取任何数据。我在 stackdriver 日志中看到的唯一错误是
Blockquote "Request failed with code 409, performed 0 retries due to IOExceptions, performed 0 retries due to unsuccessful status codes, HTTP framework says request can be retried, (caller responsible for retrying): https://www.googleapis.com/bigquery/v2/projects/dataflow-begining/jobs"
我用来提取的示例代码是
pipeline.apply("Extract from BQ", BigQueryIO.readTableRows().fromQuery("SELECT * from bq_test.employee"))
感谢任何帮助
我以前在使用模板时见过这种情况。根据文档 here,在 Usage with templates 部分:
When using read() or readTableRows() in a template, it's required to
specify BigQueryIO.Read.withTemplateCompatibility(). Specifying this
in a non-template pipeline is not recommended because it has somewhat
lower performance.
并在 withTemplateCompatibility
部分:
Use new template-compatible source implementation. This implementation
is compatible with repeated template invocations.
如果是这样,您应该使用:
pipeline.apply("Extract from BQ", BigQueryIO
.readTableRows()
.withTemplateCompatibility()
.fromQuery("SELECT * from bq_test.employee"))
我需要使用数据流从 BigQuery table 中提取数据并写入 GCS 存储桶。
数据流是使用 apache beam (Java) 构建的。数据流首次从BigQuery中提取并完美写入GCS
但是,当第一个管道成功执行后,第二个数据流启动以从同一个 table 中提取数据时,它不会从 Big Query 中提取任何数据。我在 stackdriver 日志中看到的唯一错误是
Blockquote "Request failed with code 409, performed 0 retries due to IOExceptions, performed 0 retries due to unsuccessful status codes, HTTP framework says request can be retried, (caller responsible for retrying): https://www.googleapis.com/bigquery/v2/projects/dataflow-begining/jobs"
我用来提取的示例代码是
pipeline.apply("Extract from BQ", BigQueryIO.readTableRows().fromQuery("SELECT * from bq_test.employee"))
感谢任何帮助
我以前在使用模板时见过这种情况。根据文档 here,在 Usage with templates 部分:
When using read() or readTableRows() in a template, it's required to specify BigQueryIO.Read.withTemplateCompatibility(). Specifying this in a non-template pipeline is not recommended because it has somewhat lower performance.
并在 withTemplateCompatibility
部分:
Use new template-compatible source implementation. This implementation is compatible with repeated template invocations.
如果是这样,您应该使用:
pipeline.apply("Extract from BQ", BigQueryIO
.readTableRows()
.withTemplateCompatibility()
.fromQuery("SELECT * from bq_test.employee"))