触发条件替换但保留字段值

spark conditional replacement but keep filed values

我想有条件地在 spark 中填充 nan 值(以确保我考虑了数据的每个极端情况,而不是简单地用替换值填充任何内容)。

样本可能看起来像

case class FooBar(foo:String, bar:String)
val myDf = Seq(("a","first"),("b","second"),("c",null), ("third","fooBar"), ("someMore","null"))
         .toDF("foo","bar")
         .as[FooBar]

+--------+------+
|     foo|   bar|
+--------+------+
|       a| first|
|       b|second|
|       c|  null|
|   third|fooBar|
|someMore|  null|
+--------+------+

不幸的是

    myDf
        .withColumn(
          "bar",
          when(
            (($"foo" === "c") and ($"bar" isNull)) , "someReplacement" 
          )
        ).show

重置列中的所有常规其他值

+--------+---------------+
|     foo|            bar|
+--------+---------------+
|       a|           null|
|       b|           null|
|       c|someReplacement|
|   third|           null|
|someMore|           null|
+--------+---------------+

myDf
    .withColumn(
      "bar",
      when(
        (($"foo" === "c") and ($"bar" isNull)) or
        (($"foo" === "someMore") and ($"bar" isNull)), "someReplacement" 
      )
    ).show

我真的很想用它来填充 foo 的不同 类 / 类别的值。效果不佳。

我很好奇如何解决这个问题。

使用otherwise:

when(
  (($"foo" === "c") and ($"bar" isNull)) or
  (($"foo" === "someMore") and ($"bar" isNull)), "someReplacement" 
).otherwise($"bar")

coalesce:

coalesce(
  $"bar",  
  when(($"foo" === "c") or ($"foo" === "someMore"), "someReplacement")
)

coalesce 的原因是......打字少(所以你不会重复 $"bar" isNull)。