使用地图创建新列
Create a new column using a map
有没有办法(不使用 UDF)获取现有数据框并通过获取现有列并从地图中提取其等效值来创建新列?
df.withColumn("newCol", transform(col("existing").using(map)))
其中 map
的键类型与 existing
相同,值是我想要的输出。
您可以将 Map
转换为 DataFrame
和 join
:
val df = sc.parallelize(Seq(
(1, "foo"), (2, "bar"), (3, "foobar")
)).toDF("id", "existing")
val map = Map("foo" -> 1, "bar" -> 2)
val lookup = sc.parallelize(map.toSeq).toDF("key", "value")
df
.join(lookup, $"existing" <=> $"key", "left")
.drop("key")
.withColumnRenamed("value", "newCol")
import sqtx.implicits._
val x = Map("foo" -> 1,"bar"-> 2, "baz"->3)
val df = sc.parallelize(Seq(
(1, "foo"), (2, "bar"), (3, "foobar")
)).toDF("id", "existing")
df.map(r => (r.getInt(0),x.getOrElse(r.getString(1),0))).toDF("id","new")
有没有办法(不使用 UDF)获取现有数据框并通过获取现有列并从地图中提取其等效值来创建新列?
df.withColumn("newCol", transform(col("existing").using(map)))
其中 map
的键类型与 existing
相同,值是我想要的输出。
您可以将 Map
转换为 DataFrame
和 join
:
val df = sc.parallelize(Seq(
(1, "foo"), (2, "bar"), (3, "foobar")
)).toDF("id", "existing")
val map = Map("foo" -> 1, "bar" -> 2)
val lookup = sc.parallelize(map.toSeq).toDF("key", "value")
df
.join(lookup, $"existing" <=> $"key", "left")
.drop("key")
.withColumnRenamed("value", "newCol")
import sqtx.implicits._
val x = Map("foo" -> 1,"bar"-> 2, "baz"->3)
val df = sc.parallelize(Seq(
(1, "foo"), (2, "bar"), (3, "foobar")
)).toDF("id", "existing")
df.map(r => (r.getInt(0),x.getOrElse(r.getString(1),0))).toDF("id","new")