scala.collection.mutable.ArrayBuffer 无法转换为 java.lang.Double (Spark)
scala.collection.mutable.ArrayBuffer cannot be cast to java.lang.Double (Spark)
我有一个这样的 DataFrame:
root
|-- midx: double (nullable = true)
|-- future: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _1: long (nullable = false)
| | |-- _2: long (nullable = false)
使用这段代码,我试图将它转换成这样的东西:
val T = withFfutures.where($"midx" === 47.0).select("midx","future").collect().map((row: Row) =>
Row {
row.getAs[Seq[Row]]("future").map { case Row(e: Long, f: Long) =>
(row.getAs[Double]("midx"), e, f)
}
}
).toList
root
|-- id: double (nullable = true)
|-- event: long (nullable = true)
|-- future: long (nullable = true)
所以计划是将 (event, future) 的数组传输到一个以这两个字段作为列的数据框中。我正在尝试将 T 传输到这样的 DataFrame 中:
val schema = StructType(Seq(
StructField("id", DoubleType, nullable = true)
, StructField("event", LongType, nullable = true)
, StructField("future", LongType, nullable = true)
))
val df = sqlContext.createDataFrame(context.parallelize(T), schema)
但是当我试图查看 df
时,我得到了这个错误:
java.lang.ClassCastException: scala.collection.mutable.ArrayBuffer cannot be cast to java.lang.Double
过了一会儿,我发现了问题所在:首先,列中的结构数组应该转换为行。所以构建最终数据框的最终代码应该是这样的:
val T = withFfutures.select("midx","future").collect().flatMap( (row: Row) =>
row.getAs[Seq[Row]]("future").map { case Row(e: Long, f: Long) =>
(row.getAs[Double]("midx") , e, f)
}.toList
).toList
val all = context.parallelize(T).toDF("id","event","future")
我有一个这样的 DataFrame:
root
|-- midx: double (nullable = true)
|-- future: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _1: long (nullable = false)
| | |-- _2: long (nullable = false)
使用这段代码,我试图将它转换成这样的东西:
val T = withFfutures.where($"midx" === 47.0).select("midx","future").collect().map((row: Row) =>
Row {
row.getAs[Seq[Row]]("future").map { case Row(e: Long, f: Long) =>
(row.getAs[Double]("midx"), e, f)
}
}
).toList
root
|-- id: double (nullable = true)
|-- event: long (nullable = true)
|-- future: long (nullable = true)
所以计划是将 (event, future) 的数组传输到一个以这两个字段作为列的数据框中。我正在尝试将 T 传输到这样的 DataFrame 中:
val schema = StructType(Seq(
StructField("id", DoubleType, nullable = true)
, StructField("event", LongType, nullable = true)
, StructField("future", LongType, nullable = true)
))
val df = sqlContext.createDataFrame(context.parallelize(T), schema)
但是当我试图查看 df
时,我得到了这个错误:
java.lang.ClassCastException: scala.collection.mutable.ArrayBuffer cannot be cast to java.lang.Double
过了一会儿,我发现了问题所在:首先,列中的结构数组应该转换为行。所以构建最终数据框的最终代码应该是这样的:
val T = withFfutures.select("midx","future").collect().flatMap( (row: Row) =>
row.getAs[Seq[Row]]("future").map { case Row(e: Long, f: Long) =>
(row.getAs[Double]("midx") , e, f)
}.toList
).toList
val all = context.parallelize(T).toDF("id","event","future")