如何使用哈希图 update/transform/replace 激发 df 列值

how to update/transform/replace spark df column values using a hashmap

我想使用哈希图替换给定 df 列的值,但我在语法上遇到困难。 有人可以指出正确的方向或现有的例子吗?我已经搜索过但是 找不到能阐明确切主题的内容。

编辑:

想象一个如下所示的数据框:

+-----------+--------+-----------+
|       Noun| Pronoun|  Adjective|
+-----------+--------+-----------+
|      Homer| Simpson|BeerDrinker|
|      Marge| Simpson|  Housewife|
|       Bart| Simpson|        Son|
|       Lisa| Simpson|   Daughter|
|TheSimpsons|Simpsons|     Family|
+-----------+--------+-----------+

我有一个键值对映射,如下所示:

  type ValueMap = scala.collection.mutable.HashMap [String,String]
  var mymap = new ValueMap ()
  mymap += ("Simpson" -> "Surname")

我想做一个操作(目前我还无法弄清楚)并获得如下所示的结果。所以基本上在列 Pronoun 中,所有等于 Simpson 的列值都已被地图 mymap 中的相应值替换,即 Surname

+-----------+--------+-----------+
|       Noun| Pronoun|  Adjective|
+-----------+--------+-----------+
|      Homer| Surname|BeerDrinker|
|      Marge| Surname|  Housewife|
|       Bart| Surname|        Son|
|       Lisa| Surname|   Daughter|
|TheSimpsons|Simpsons|     Family|
+-----------+--------+-----------+

用 UDF 试试这个方法,

val myMap = Map("Simpson" -> "Surname")
val df = Seq(("Homer","Simpson","BeerDrinker"),("Marge","Simpson","Housewife"),("Bart","Simpson","Son"),("Lisa","Simpson","Daughter"),("TheSimpsons","Simpsons","Family")).toDF("Noun","Pronoun","Adjective")

df.show(false)

-----------+--------+-----------+
|Noun       |Pronoun |Adjective  |
+-----------+--------+-----------+
|Homer      |Simpson |BeerDrinker|
|Marge      |Simpson |Housewife  |
|Bart       |Simpson |Son        |
|Lisa       |Simpson |Daughter   |
|TheSimpsons|Simpsons|Family     |
+-----------+--------+-----------+

val getVal = udf((x: String) => myMap.getOrElse(x, x))
val resDF = df.withColumn("Pronoun", getVal($"Pronoun"))

resDF.show(false)

+-----------+--------+-----------+
|Noun       |Pronoun |Adjective  |
+-----------+--------+-----------+
|Homer      |Surname |BeerDrinker|
|Marge      |Surname |Housewife  |
|Bart       |Surname |Son        |
|Lisa       |Surname |Daughter   |
|TheSimpsons|Simpsons|Family     |
+-----------+--------+-----------+

如果有帮助请告诉我。

Updated:

没有UDF,

将地图作为多一列添加到 DF

val df1 = df.withColumn("map", typedLit(myMap))
val df2 = df1.withColumn("Pronoun", when($"map"($"Pronoun").isNotNull, $"map"($"Pronoun")).otherwise($"Pronoun") ).drop("map")
df2.show(false)

+-----------+--------+-----------+
|Noun       |Pronoun |Adjective  |
+-----------+--------+-----------+
|Homer      |Surname |BeerDrinker|
|Marge      |Surname |Housewife  |
|Bart       |Surname |Son        |
|Lisa       |Surname |Daughter   |
|TheSimpsons|Simpsons|Family     |
+-----------+--------+-----------+

Another simple way instead of adding new column,

val colMap = typedLit(myMap)
val df3 = df.withColumn("Pronoun", when(colMap($"Pronoun").isNotNull, colMap($"Pronoun")).otherwise($"Pronoun") )
df3.show(false)