为什么案例 class returns spark sql 中的数据框

Why the case class returns a dataframe in spark sql

以下仅以联盟为例

我正在阅读 spark sql 源代码,并卡在这段代码上,它位于 DataFrame.scala

def unionAll(other: DataFrame): DataFrame = Union(logicalPlan, other.logicalPlan)

而联盟是这样定义的 class

case class Union(left: LogicalPlan, right: LogicalPlan) extends BinaryNode {...}

我很困惑,结果怎么能被当作DataFrame类型的实例呢?

好吧,如果 Scala 中有什么地方不清楚,那一定是 implicit。首先让我们看一下 BinaryNode node definition:

abstract class BinaryNode extends LogicalPlan

因为 LogicalPlan 结合 SQLContext is the only thing required to create a DataFrame it looks like a good place for a conversion. And here it is:

@inline private implicit def logicalPlanToDataFrame(logicalPlan: LogicalPlan): 
    DataFrame = {
  new DataFrame(sqlContext, logicalPlan)
}

实际上此转换已在 1.6.0 中被 SPARK-11513 删除,描述如下:

DataFrame has an internal implicit conversion that turns a LogicalPlan into a DataFrame. This has been fairly confusing to a few new contributors. Since it doesn't buy us much, we should just remove that implicit conversion.