Apache Flink:How 是否对数据集 api 中的动态元组提供 groupBy/sortBy 转换支持？

Question

我使用 java.util.Map 作为位置或表达式键不支持的数据类型，那么如果组字段的数量超过 25，我该如何对基于 java.util.Map 的数据集进行分组？

示例代码如下：

Map<String,Object> input1 = new HashMap<>();
for (int i=0; i<30; i++){
    input.put("groupField" + i,"value"+i);
}
input1.put("quantity",200);
Map<String,Object> input2 = new HashMap<>();
for (int i=0; i<30; i++){
    input2.put("groupField" + i,"value"+i);
}
input2.put("quantity",200);

DataSet<Map<String,Object>> input = env.fromElements(input1,input2);
//how can i group this map based dataSet and aggregate on the 'quantity' field if the number of grouping fields is more than 25?

Answer 1

我建议使用自定义 class 作为分组和排序操作的键类型，即 KeySelector 中此键 class 的 return 个对象。如果自定义 class 正确实现了 java.lang.Comparable 接口和 Object.hashCode() 方法，则可以用作键。

例如，您的密钥类型可以是 java.util.ArrayList 的简单包装，以支持任意多个字段。

Apache Flink:How 是否对数据集 api 中的动态元组提供 groupBy/sortBy 转换支持？

Apache Flink:How does the groupBy/ sortBy transform support on the dynamic tuples in dataset api?

apache-flink