汇总 DataFrame 中的行

Sum up rows in DataFrame

给定一个DataFrame,例如

val df = sc.parallelize(Seq((1L, 0.1), (2L, 0.2), (3L, 0.3))).toDF("k","v")

df.show
+---+---+
|  k|  v|
+---+---+
|  1|0.1|
|  2|0.2|
|  3|0.3|
+---+---+

如何将每一行汇总到一个新列中,命名为 totals 以便 dfTotals.show

+---+---+--------+
|  k|  v|  totals|
+---+---+--------+
|  1|0.1|     1.1|
|  2|0.2|     2.2|
|  3|0.3|     3.3|
+---+---+--------+

找到了比原先想象的更简单的解决方案,

val totals = ($"k" + $"v")
val dfTotals = df.withColumn("totals", totals)

等等

dfTotals.show
+---+---+------+
|  k|  v|totals|
+---+---+------+
|  1|0.1|   1.1|
|  2|0.2|   2.2|
|  3|0.3|   3.3|
+---+---+------+

更新:另一种方法,虽然不是很整洁,

df.select(df("k"), df("v"), df("k")+df("v"))