泛型 RDD 上的 Apache Spark join/cogroup

Question

我对 RDD 上的 join 或 cogroup 方法有疑问。详细地说，我必须加入两个 RDD，其中一个是通用类型的 RDD，与通配符一起使用。

val indexedMeasures = measures.map(m => (m.id(), m)) // RDD[(String, Measure[_]]
val indexedRegistry = registry.map(r => (r.id, r))   // RDD[(String, Registry)]
indexedRegistry.cogroup(indexedMeasures)

最后一条语句给出编译时错误，如下：

no type parameters for method cogroup: (other: org.apache.spark.rdd.RDD[(String, W)])org.apache.spark.rdd.RDD[(String, (Iterable[Registry], 
 Iterable[W]))] exist so that it can be applied to arguments (org.apache.spark.rdd.RDD[(String, Measure[?0]) forSome { type ?0 }]) --- because --- argument expression's type is not compatible 
 with formal parameter type; found : org.apache.spark.rdd.RDD[(String, Measure[?0]) forSome { type ?0 }] required: org.apache.spark.rdd.RDD[(String, ?W)] Note: (String, 
 Measure[?0]) forSome { type ?0 } >: (String, ?W), but class RDD is invariant in type T. You may wish to define T as -T instead. (SLS 4.5)

这是怎么回事？为什么我 cogroup RDD 不能使用通用通配符类型？

感谢您的所有回复。

Answer 1

这篇文章中提到的问题Towards Equal Rights for Higher-kinded Types

Generics are a very popular feature of contemporary OO languages, such as Java, C# or Scala. Their support for genericity is lacking, however. The problem is that they only support abstracting over proper types, and not over generic types. This limitation makes it impossible to, e.g., define a precise interface for Iterable, a core abstraction in Scala’s collection API. We implemented “type constructor polymorphism” in Scala 2.5, which solves this problem at the root, thus greatly reducing the duplication of type signatures and code.

泛型 RDD 上的 Apache Spark join/cogroup

Apache Spark join/cogroup on generic type RDD

generics

scala

apache-spark

rdd