如何使用哈希图 update/transform/replace 激发 df 列值
how to update/transform/replace spark df column values using a hashmap
我想使用哈希图替换给定 df 列的值,但我在语法上遇到困难。
有人可以指出正确的方向或现有的例子吗?我已经搜索过但是
找不到能阐明确切主题的内容。
编辑:
想象一个如下所示的数据框:
+-----------+--------+-----------+
| Noun| Pronoun| Adjective|
+-----------+--------+-----------+
| Homer| Simpson|BeerDrinker|
| Marge| Simpson| Housewife|
| Bart| Simpson| Son|
| Lisa| Simpson| Daughter|
|TheSimpsons|Simpsons| Family|
+-----------+--------+-----------+
我有一个键值对映射,如下所示:
type ValueMap = scala.collection.mutable.HashMap [String,String]
var mymap = new ValueMap ()
mymap += ("Simpson" -> "Surname")
我想做一个操作(目前我还无法弄清楚)并获得如下所示的结果。所以基本上在列 Pronoun
中,所有等于 Simpson
的列值都已被地图 mymap
中的相应值替换,即 Surname
+-----------+--------+-----------+
| Noun| Pronoun| Adjective|
+-----------+--------+-----------+
| Homer| Surname|BeerDrinker|
| Marge| Surname| Housewife|
| Bart| Surname| Son|
| Lisa| Surname| Daughter|
|TheSimpsons|Simpsons| Family|
+-----------+--------+-----------+
用 UDF 试试这个方法,
val myMap = Map("Simpson" -> "Surname")
val df = Seq(("Homer","Simpson","BeerDrinker"),("Marge","Simpson","Housewife"),("Bart","Simpson","Son"),("Lisa","Simpson","Daughter"),("TheSimpsons","Simpsons","Family")).toDF("Noun","Pronoun","Adjective")
df.show(false)
-----------+--------+-----------+
|Noun |Pronoun |Adjective |
+-----------+--------+-----------+
|Homer |Simpson |BeerDrinker|
|Marge |Simpson |Housewife |
|Bart |Simpson |Son |
|Lisa |Simpson |Daughter |
|TheSimpsons|Simpsons|Family |
+-----------+--------+-----------+
val getVal = udf((x: String) => myMap.getOrElse(x, x))
val resDF = df.withColumn("Pronoun", getVal($"Pronoun"))
resDF.show(false)
+-----------+--------+-----------+
|Noun |Pronoun |Adjective |
+-----------+--------+-----------+
|Homer |Surname |BeerDrinker|
|Marge |Surname |Housewife |
|Bart |Surname |Son |
|Lisa |Surname |Daughter |
|TheSimpsons|Simpsons|Family |
+-----------+--------+-----------+
如果有帮助请告诉我。
Updated:
没有UDF,
将地图作为多一列添加到 DF
val df1 = df.withColumn("map", typedLit(myMap))
val df2 = df1.withColumn("Pronoun", when($"map"($"Pronoun").isNotNull, $"map"($"Pronoun")).otherwise($"Pronoun") ).drop("map")
df2.show(false)
+-----------+--------+-----------+
|Noun |Pronoun |Adjective |
+-----------+--------+-----------+
|Homer |Surname |BeerDrinker|
|Marge |Surname |Housewife |
|Bart |Surname |Son |
|Lisa |Surname |Daughter |
|TheSimpsons|Simpsons|Family |
+-----------+--------+-----------+
Another simple way instead of adding new column,
val colMap = typedLit(myMap)
val df3 = df.withColumn("Pronoun", when(colMap($"Pronoun").isNotNull, colMap($"Pronoun")).otherwise($"Pronoun") )
df3.show(false)
我想使用哈希图替换给定 df 列的值,但我在语法上遇到困难。 有人可以指出正确的方向或现有的例子吗?我已经搜索过但是 找不到能阐明确切主题的内容。
编辑:
想象一个如下所示的数据框:
+-----------+--------+-----------+
| Noun| Pronoun| Adjective|
+-----------+--------+-----------+
| Homer| Simpson|BeerDrinker|
| Marge| Simpson| Housewife|
| Bart| Simpson| Son|
| Lisa| Simpson| Daughter|
|TheSimpsons|Simpsons| Family|
+-----------+--------+-----------+
我有一个键值对映射,如下所示:
type ValueMap = scala.collection.mutable.HashMap [String,String]
var mymap = new ValueMap ()
mymap += ("Simpson" -> "Surname")
我想做一个操作(目前我还无法弄清楚)并获得如下所示的结果。所以基本上在列 Pronoun
中,所有等于 Simpson
的列值都已被地图 mymap
中的相应值替换,即 Surname
+-----------+--------+-----------+
| Noun| Pronoun| Adjective|
+-----------+--------+-----------+
| Homer| Surname|BeerDrinker|
| Marge| Surname| Housewife|
| Bart| Surname| Son|
| Lisa| Surname| Daughter|
|TheSimpsons|Simpsons| Family|
+-----------+--------+-----------+
用 UDF 试试这个方法,
val myMap = Map("Simpson" -> "Surname")
val df = Seq(("Homer","Simpson","BeerDrinker"),("Marge","Simpson","Housewife"),("Bart","Simpson","Son"),("Lisa","Simpson","Daughter"),("TheSimpsons","Simpsons","Family")).toDF("Noun","Pronoun","Adjective")
df.show(false)
-----------+--------+-----------+
|Noun |Pronoun |Adjective |
+-----------+--------+-----------+
|Homer |Simpson |BeerDrinker|
|Marge |Simpson |Housewife |
|Bart |Simpson |Son |
|Lisa |Simpson |Daughter |
|TheSimpsons|Simpsons|Family |
+-----------+--------+-----------+
val getVal = udf((x: String) => myMap.getOrElse(x, x))
val resDF = df.withColumn("Pronoun", getVal($"Pronoun"))
resDF.show(false)
+-----------+--------+-----------+
|Noun |Pronoun |Adjective |
+-----------+--------+-----------+
|Homer |Surname |BeerDrinker|
|Marge |Surname |Housewife |
|Bart |Surname |Son |
|Lisa |Surname |Daughter |
|TheSimpsons|Simpsons|Family |
+-----------+--------+-----------+
如果有帮助请告诉我。
Updated:
没有UDF,
将地图作为多一列添加到 DF
val df1 = df.withColumn("map", typedLit(myMap))
val df2 = df1.withColumn("Pronoun", when($"map"($"Pronoun").isNotNull, $"map"($"Pronoun")).otherwise($"Pronoun") ).drop("map")
df2.show(false)
+-----------+--------+-----------+
|Noun |Pronoun |Adjective |
+-----------+--------+-----------+
|Homer |Surname |BeerDrinker|
|Marge |Surname |Housewife |
|Bart |Surname |Son |
|Lisa |Surname |Daughter |
|TheSimpsons|Simpsons|Family |
+-----------+--------+-----------+
Another simple way instead of adding new column,
val colMap = typedLit(myMap)
val df3 = df.withColumn("Pronoun", when(colMap($"Pronoun").isNotNull, colMap($"Pronoun")).otherwise($"Pronoun") )
df3.show(false)