触发条件替换但保留字段值
spark conditional replacement but keep filed values
我想有条件地在 spark 中填充 nan 值(以确保我考虑了数据的每个极端情况,而不是简单地用替换值填充任何内容)。
样本可能看起来像
case class FooBar(foo:String, bar:String)
val myDf = Seq(("a","first"),("b","second"),("c",null), ("third","fooBar"), ("someMore","null"))
.toDF("foo","bar")
.as[FooBar]
+--------+------+
| foo| bar|
+--------+------+
| a| first|
| b|second|
| c| null|
| third|fooBar|
|someMore| null|
+--------+------+
不幸的是
myDf
.withColumn(
"bar",
when(
(($"foo" === "c") and ($"bar" isNull)) , "someReplacement"
)
).show
重置列中的所有常规其他值
+--------+---------------+
| foo| bar|
+--------+---------------+
| a| null|
| b| null|
| c|someReplacement|
| third| null|
|someMore| null|
+--------+---------------+
和
myDf
.withColumn(
"bar",
when(
(($"foo" === "c") and ($"bar" isNull)) or
(($"foo" === "someMore") and ($"bar" isNull)), "someReplacement"
)
).show
我真的很想用它来填充 foo 的不同 类 / 类别的值。效果不佳。
我很好奇如何解决这个问题。
使用otherwise
:
when(
(($"foo" === "c") and ($"bar" isNull)) or
(($"foo" === "someMore") and ($"bar" isNull)), "someReplacement"
).otherwise($"bar")
或coalesce
:
coalesce(
$"bar",
when(($"foo" === "c") or ($"foo" === "someMore"), "someReplacement")
)
coalesce
的原因是......打字少(所以你不会重复 $"bar" isNull
)。
我想有条件地在 spark 中填充 nan 值(以确保我考虑了数据的每个极端情况,而不是简单地用替换值填充任何内容)。
样本可能看起来像
case class FooBar(foo:String, bar:String)
val myDf = Seq(("a","first"),("b","second"),("c",null), ("third","fooBar"), ("someMore","null"))
.toDF("foo","bar")
.as[FooBar]
+--------+------+
| foo| bar|
+--------+------+
| a| first|
| b|second|
| c| null|
| third|fooBar|
|someMore| null|
+--------+------+
不幸的是
myDf
.withColumn(
"bar",
when(
(($"foo" === "c") and ($"bar" isNull)) , "someReplacement"
)
).show
重置列中的所有常规其他值
+--------+---------------+
| foo| bar|
+--------+---------------+
| a| null|
| b| null|
| c|someReplacement|
| third| null|
|someMore| null|
+--------+---------------+
和
myDf
.withColumn(
"bar",
when(
(($"foo" === "c") and ($"bar" isNull)) or
(($"foo" === "someMore") and ($"bar" isNull)), "someReplacement"
)
).show
我真的很想用它来填充 foo 的不同 类 / 类别的值。效果不佳。
我很好奇如何解决这个问题。
使用otherwise
:
when(
(($"foo" === "c") and ($"bar" isNull)) or
(($"foo" === "someMore") and ($"bar" isNull)), "someReplacement"
).otherwise($"bar")
或coalesce
:
coalesce(
$"bar",
when(($"foo" === "c") or ($"foo" === "someMore"), "someReplacement")
)
coalesce
的原因是......打字少(所以你不会重复 $"bar" isNull
)。