在 mapreduce 中将 BigQuery UDF 实现为地图的动机是什么？

Whats the motivation behind implementing BigQuery UDFs as map in mapreduce?

google-bigquery

Google BigQuery 现在支持像 mapreduce 中的映射器一样工作的 UDF。

BigQuery supports user-defined functions (UDFs) written in JavaScript. A UDF is similar to the "Map" function in a MapReduce: it takes a single row as input and produces zero or more rows as output. The output can potentially have a different schema than the input.

来自https://cloud.google.com/bigquery/user-defined-functions

在行上实现 UDF 而不是允许 UDF 在 columns/fields 上作为纯函数工作的动机是什么，例如 UDF 在配置单元中的工作方式 https://cwiki.apache.org/confluence/display/Hive/HivePlugins。

我想您可以将任何适用于列的 UDF（如 hive UDF）表示为适用于行的 UDF（BigQuery UDF），但反之则不行。这可以通过定义一个 UDF（在 BigQuery 中）来实现，该 UDF 具有与数据集相同的输入和输出模式，并且所有值都刚刚通过但您要将函数应用到的字段。

如果你想将相同的函数应用于具有不同模式的不同数据集，这当然很麻烦。请帮助我理解。

目前在 BigQuery 中实现 UDF 只是第一步。正如您所注意到的 - 如果您希望能够处理嵌套和重复的结构，这是最通用的方法，但是当您只需要简单的标量值时，它会变得很麻烦。期待在这个领域的未来改进，简单的 UDF 将变得简单。

在 mapreduce 中将 BigQuery UDF 实现为地图的动机是什么？

Whats the motivation behind implementing BigQuery UDFs as map in mapreduce?

google-bigquery