汇总 DataFrame 中的行
Sum up rows in DataFrame
给定一个DataFrame
,例如
val df = sc.parallelize(Seq((1L, 0.1), (2L, 0.2), (3L, 0.3))).toDF("k","v")
df.show
+---+---+
| k| v|
+---+---+
| 1|0.1|
| 2|0.2|
| 3|0.3|
+---+---+
如何将每一行汇总到一个新列中,命名为 totals
以便 dfTotals.show
+---+---+--------+
| k| v| totals|
+---+---+--------+
| 1|0.1| 1.1|
| 2|0.2| 2.2|
| 3|0.3| 3.3|
+---+---+--------+
找到了比原先想象的更简单的解决方案,
val totals = ($"k" + $"v")
val dfTotals = df.withColumn("totals", totals)
等等
dfTotals.show
+---+---+------+
| k| v|totals|
+---+---+------+
| 1|0.1| 1.1|
| 2|0.2| 2.2|
| 3|0.3| 3.3|
+---+---+------+
更新:另一种方法,虽然不是很整洁,
df.select(df("k"), df("v"), df("k")+df("v"))
给定一个DataFrame
,例如
val df = sc.parallelize(Seq((1L, 0.1), (2L, 0.2), (3L, 0.3))).toDF("k","v")
df.show
+---+---+
| k| v|
+---+---+
| 1|0.1|
| 2|0.2|
| 3|0.3|
+---+---+
如何将每一行汇总到一个新列中,命名为 totals
以便 dfTotals.show
+---+---+--------+
| k| v| totals|
+---+---+--------+
| 1|0.1| 1.1|
| 2|0.2| 2.2|
| 3|0.3| 3.3|
+---+---+--------+
找到了比原先想象的更简单的解决方案,
val totals = ($"k" + $"v")
val dfTotals = df.withColumn("totals", totals)
等等
dfTotals.show
+---+---+------+
| k| v|totals|
+---+---+------+
| 1|0.1| 1.1|
| 2|0.2| 2.2|
| 3|0.3| 3.3|
+---+---+------+
更新:另一种方法,虽然不是很整洁,
df.select(df("k"), df("v"), df("k")+df("v"))