多个 withColumn + when Spark 数据帧上的指令的全局条件
Global condition on multiple withColumn + when instruction on Spark dataframe
考虑这个 df
+----+------+
|cond|chaine|
+----+------+
| 0| TF1|
| 1| TF1|
| 1| TNT|
+----+------+
我想应用此 withColumn 指令,但仅适用于具有 cond == 1
:
的行
df.withColumn("New", when($"chaine" === "TF1", "YES!"))
.withColumn("New2", when($"chaine" === "TF1", "YES2!"))
.withColumn("New3", when($"chaine" === "TF1", "YES3!"))
.withColumn("New4", when($"chaine" === "TF1", "YES4!"))
我不能使用 .filter
,因为我仍然希望在输出中包含 cond =!= 1
的行。
我可以通过在代码的每个地方添加我的条件来做到这一点:
df.withColumn("New", when($"chaine" === "TF1" AND $"cond" === 1, "YES!"))
.withColumn("New2", when($"chaine" === "TF1" AND $"cond" === 1, "YES2!"))
.withColumn("New3", when($"chaine" === "TF1" AND $"cond" === 1, "YES3!"))
.withColumn("New4", when($"chaine" === "TF1" AND $"cond" === 1, "YES4!"))
但问题是我有很多新专栏,我想要一个更好的解决方案(比如全局配置?)
谢谢。
一些简单的句法思路:
def whenCondIs(n: Int)(condition: Column, value: Any): Column =
when(condition && $"cond" === n, value)
def whenOne(condition: Column, value: Any): Column =
whenCondIs(1)(condition, value)
然后:
df.withColumn("New", whenOne($"chaine" === "TF1", "YES2!"))
.withColumn("New2", whenOne($"chaine" === "TF1", "YES2!"))
您可以在列表中创建条件和新列之间的映射,然后使用 foldLeft
将它们添加到您的数据框中。像这样:
val newCols = Seq(
("New", "chaine='TF1'", "YES!"),
("New2", "chaine='TF1'", "YES2!"),
("New3", "chaine='TF1'", "YES3!"),
("New4", "chaine='TF1'", "YES4!")
)
val df1 = newCols.foldLeft(df)((acc, x) =>
acc.withColumn(x._1, when(expr(x._2) && col("cond")===1, lit(x._3)))
)
考虑这个 df
+----+------+
|cond|chaine|
+----+------+
| 0| TF1|
| 1| TF1|
| 1| TNT|
+----+------+
我想应用此 withColumn 指令,但仅适用于具有 cond == 1
:
df.withColumn("New", when($"chaine" === "TF1", "YES!"))
.withColumn("New2", when($"chaine" === "TF1", "YES2!"))
.withColumn("New3", when($"chaine" === "TF1", "YES3!"))
.withColumn("New4", when($"chaine" === "TF1", "YES4!"))
我不能使用 .filter
,因为我仍然希望在输出中包含 cond =!= 1
的行。
我可以通过在代码的每个地方添加我的条件来做到这一点:
df.withColumn("New", when($"chaine" === "TF1" AND $"cond" === 1, "YES!"))
.withColumn("New2", when($"chaine" === "TF1" AND $"cond" === 1, "YES2!"))
.withColumn("New3", when($"chaine" === "TF1" AND $"cond" === 1, "YES3!"))
.withColumn("New4", when($"chaine" === "TF1" AND $"cond" === 1, "YES4!"))
但问题是我有很多新专栏,我想要一个更好的解决方案(比如全局配置?)
谢谢。
一些简单的句法思路:
def whenCondIs(n: Int)(condition: Column, value: Any): Column =
when(condition && $"cond" === n, value)
def whenOne(condition: Column, value: Any): Column =
whenCondIs(1)(condition, value)
然后:
df.withColumn("New", whenOne($"chaine" === "TF1", "YES2!"))
.withColumn("New2", whenOne($"chaine" === "TF1", "YES2!"))
您可以在列表中创建条件和新列之间的映射,然后使用 foldLeft
将它们添加到您的数据框中。像这样:
val newCols = Seq(
("New", "chaine='TF1'", "YES!"),
("New2", "chaine='TF1'", "YES2!"),
("New3", "chaine='TF1'", "YES3!"),
("New4", "chaine='TF1'", "YES4!")
)
val df1 = newCols.foldLeft(df)((acc, x) =>
acc.withColumn(x._1, when(expr(x._2) && col("cond")===1, lit(x._3)))
)