scala.collection.mutable.ArrayBuffer 无法转换为 java.lang.Double (Spark)

scala.collection.mutable.ArrayBuffer cannot be cast to java.lang.Double (Spark)

我有一个这样的 DataFrame:

root
 |-- midx: double (nullable = true)
 |-- future: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- _1: long (nullable = false)
 |    |    |-- _2: long (nullable = false)

使用这段代码,我试图将它转换成这样的东西:

val T = withFfutures.where($"midx" === 47.0).select("midx","future").collect().map((row: Row) =>
      Row {
        row.getAs[Seq[Row]]("future").map { case Row(e: Long, f: Long) =>
          (row.getAs[Double]("midx"), e, f)
        }
      }
    ).toList

root
 |-- id: double (nullable = true)
 |-- event: long (nullable = true)
 |-- future: long (nullable = true)

所以计划是将 (event, future) 的数组传输到一个以这两个字段作为列的数据框中。我正在尝试将 T 传输到这样的 DataFrame 中:

val schema = StructType(Seq(
  StructField("id", DoubleType, nullable = true)
  , StructField("event", LongType, nullable = true)
  , StructField("future", LongType, nullable = true)
))

val df = sqlContext.createDataFrame(context.parallelize(T), schema)

但是当我试图查看 df 时,我得到了这个错误:

java.lang.ClassCastException: scala.collection.mutable.ArrayBuffer cannot be cast to java.lang.Double

过了一会儿,我发现了问题所在:首先,列中的结构数组应该转换为行。所以构建最终数据框的最终代码应该是这样的:

val T = withFfutures.select("midx","future").collect().flatMap( (row: Row) =>
    row.getAs[Seq[Row]]("future").map { case Row(e: Long, f: Long) =>
      (row.getAs[Double]("midx") , e, f)
    }.toList
).toList

val all = context.parallelize(T).toDF("id","event","future")