TableRow.get("field_name") 只能在 Dataflow ParDo 中转换为 String

Question

我正在通过数据流从 BQ 导出一个 table，似乎在 ParDo 处理时，我只能得到 TableRow 中每个字段数据的 "string" 值，无论BQ 模式中最初的数据类型是什么。

例如，假设我的 table 有一个 INTEGER 类型的列 "fieldA":

     public void processElement(ProcessContext c) throws Exception {
         TableRow row = c.element();
         String str = (String) c.get("fieldA"); // OK
         Integer i = (Integer) c.get("fieldA"); // Throw "String cannot be cast to Integer" exception
     }

这是一个错误还是只有我一个？如果不仅仅是我，有没有办法绕过它？对于整数类型，我仍然可以做 Integer.valueOf(String) 但在解析 Timestamp 字段时它必须有点笨拙且容易出错。

仅供参考，我正在使用 BlockDataflowPipelineRunner

Answer 1

根据BigQueryTableRowIterator：

Note that integers are encoded as strings to match BigQuery's exported JSON format.

所以你需要Integer.parseInt。很抱歉给您带来麻烦，我们应该改进有关在从 BigQueryIO.Read 读取时在 TableRow 中键入值的文档 - 该文档不是很容易被发现。

TableRow.get("field_name") 只能在 Dataflow ParDo 中转换为 String

TableRow.get("field_name") can only be cast to String in Dataflow ParDo

google-bigquery

google-cloud-dataflow