value join 不是 org.apache.spark.rdd.RDD 的成员
value join is not a member of org.apache.spark.rdd.RDD
我收到这个错误:
value join is not a member of
org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0])))
forSome { type _0 <: (String, Double) }]
我找到的唯一建议是import org.apache.spark.SparkContext._
我已经在这样做了。
我做错了什么?
编辑:更改代码以消除 forSome
(即,当对象具有类型 org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[(String, Double)])))
时)解决了问题。
这是 Spark 中的错误吗?
join
是 org.apache.spark.rdd.PairRDDFunctions
的成员。那么为什么隐式 class 不触发?
scala> val s = Seq[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]()
scala> val r = sc.parallelize(s)
scala> r.join(r) // Gives your error message.
scala> val p = new org.apache.spark.rdd.PairRDDFunctions(r)
<console>:25: error: no type parameters for constructor PairRDDFunctions: (self: org.apache.spark.rdd.RDD[(K, V)])(implicit kt: scala.reflect.ClassTag[K], implicit vt: scala.reflect.ClassTag[V], implicit ord: Ordering[K])org.apache.spark.rdd.PairRDDFunctions[K,V] exist so that it can be applied to arguments (org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }])
--- because ---
argument expression's type is not compatible with formal parameter type;
found : org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]
required: org.apache.spark.rdd.RDD[(?K, ?V)]
Note: (Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) } >: (?K, ?V), but class RDD is invariant in type T.
You may wish to define T as -T instead. (SLS 4.5)
val p = new org.apache.spark.rdd.PairRDDFunctions(r)
^
<console>:25: error: type mismatch;
found : org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]
required: org.apache.spark.rdd.RDD[(K, V)]
val p = new org.apache.spark.rdd.PairRDDFunctions(r)
我相信其他人都清楚错误消息,但为了我自己的缓慢,让我们试着理解它。 PairRDDFunctions
有两个类型参数,K
和 V
。您的 forSome
是整对的,因此不能拆分为单独的 K
和 V
类型。没有 K
和 V
RDD[(K, V)]
等于您的 RDD 类型。
但是,您可以让 forSome
仅应用于密钥,而不是整对密钥。加入现在有效,因为这种类型可以分为 K
和 V
.
scala> val s2 = Seq[(Long, (Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) })]()
scala> val r2 = sc.parallelize(2s)
scala> r2.join(r2)
res0: org.apache.spark.rdd.RDD[(Long, ((Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) }, (Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) }))] = MapPartitionsRDD[5] at join at <console>:26
考虑将 2 个 Spark RDD 连接在一起..
比如说,rdd1.first
的形式是(Int, Int, Float) = (1,957,299.98)
而 rdd2.first
类似于 (Int, Int) = (25876,1)
,其中连接应该发生在两个 RDD 的第一个字段上。
scala> rdd1.join(rdd2) --- results in an error :**: error:
value join is not a member of org.apache.spark.rdd.RDD[(Int, Int,
Float)]
原因
两个RDD都应该是Key-Value对的形式。
在这里,rdd2 —— 形式为 (1,957,299.98) —— 不遵守此规则。而 rdd1 —— 形式为 (25876,1) —— 遵守。
决议
将第一个RDD的输出从(1,957,299.98)
转换为(1,(957,299.98))
形式的Key-Value对,然后加入rdd2,如下所示:
scala> val rdd1KV = rdd1.map(x=>(x.split(",")(1).toInt,(x.split(",")(2).toInt,x.split(",")(4).toFloat))) -- modified RDD
scala> rdd1KV.join(rdd2) -- join successful :)
res**: (Int, (Int, Float)) = (1,(957,299.98))
对了,join是org.apache.spark.rdd.PairRDDFunctions的成员。因此,请确保将其导入 Eclipse 或 IDE,无论您想要 运行 您的代码。
文章也在我的博客上:
https://tips-to-code.blogspot.com/2018/08/apache-spark-error-resolution-value.html
我收到这个错误:
value join is not a member of
org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0])))
forSome { type _0 <: (String, Double) }]
我找到的唯一建议是import org.apache.spark.SparkContext._
我已经在这样做了。
我做错了什么?
编辑:更改代码以消除 forSome
(即,当对象具有类型 org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[(String, Double)])))
时)解决了问题。
这是 Spark 中的错误吗?
join
是 org.apache.spark.rdd.PairRDDFunctions
的成员。那么为什么隐式 class 不触发?
scala> val s = Seq[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]()
scala> val r = sc.parallelize(s)
scala> r.join(r) // Gives your error message.
scala> val p = new org.apache.spark.rdd.PairRDDFunctions(r)
<console>:25: error: no type parameters for constructor PairRDDFunctions: (self: org.apache.spark.rdd.RDD[(K, V)])(implicit kt: scala.reflect.ClassTag[K], implicit vt: scala.reflect.ClassTag[V], implicit ord: Ordering[K])org.apache.spark.rdd.PairRDDFunctions[K,V] exist so that it can be applied to arguments (org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }])
--- because ---
argument expression's type is not compatible with formal parameter type;
found : org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]
required: org.apache.spark.rdd.RDD[(?K, ?V)]
Note: (Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) } >: (?K, ?V), but class RDD is invariant in type T.
You may wish to define T as -T instead. (SLS 4.5)
val p = new org.apache.spark.rdd.PairRDDFunctions(r)
^
<console>:25: error: type mismatch;
found : org.apache.spark.rdd.RDD[(Long, (Int, (Long, String, Array[_0]))) forSome { type _0 <: (String, Double) }]
required: org.apache.spark.rdd.RDD[(K, V)]
val p = new org.apache.spark.rdd.PairRDDFunctions(r)
我相信其他人都清楚错误消息,但为了我自己的缓慢,让我们试着理解它。 PairRDDFunctions
有两个类型参数,K
和 V
。您的 forSome
是整对的,因此不能拆分为单独的 K
和 V
类型。没有 K
和 V
RDD[(K, V)]
等于您的 RDD 类型。
但是,您可以让 forSome
仅应用于密钥,而不是整对密钥。加入现在有效,因为这种类型可以分为 K
和 V
.
scala> val s2 = Seq[(Long, (Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) })]()
scala> val r2 = sc.parallelize(2s)
scala> r2.join(r2)
res0: org.apache.spark.rdd.RDD[(Long, ((Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) }, (Int, (Long, String, Array[_0])) forSome { type _0 <: (String, Double) }))] = MapPartitionsRDD[5] at join at <console>:26
考虑将 2 个 Spark RDD 连接在一起..
比如说,rdd1.first
的形式是(Int, Int, Float) = (1,957,299.98)
而 rdd2.first
类似于 (Int, Int) = (25876,1)
,其中连接应该发生在两个 RDD 的第一个字段上。
scala> rdd1.join(rdd2) --- results in an error :**: error: value join is not a member of org.apache.spark.rdd.RDD[(Int, Int, Float)]
原因
两个RDD都应该是Key-Value对的形式。
在这里,rdd2 —— 形式为 (1,957,299.98) —— 不遵守此规则。而 rdd1 —— 形式为 (25876,1) —— 遵守。
决议
将第一个RDD的输出从(1,957,299.98)
转换为(1,(957,299.98))
形式的Key-Value对,然后加入rdd2,如下所示:
scala> val rdd1KV = rdd1.map(x=>(x.split(",")(1).toInt,(x.split(",")(2).toInt,x.split(",")(4).toFloat))) -- modified RDD
scala> rdd1KV.join(rdd2) -- join successful :)
res**: (Int, (Int, Float)) = (1,(957,299.98))
对了,join是org.apache.spark.rdd.PairRDDFunctions的成员。因此,请确保将其导入 Eclipse 或 IDE,无论您想要 运行 您的代码。
文章也在我的博客上:
https://tips-to-code.blogspot.com/2018/08/apache-spark-error-resolution-value.html