如何在scala spark中将特定的数据框列与另一个数据框连接
how to join specific column of dataframe with another in scala spark
我有四个数据框,
df1 作为,
name city
--------------------------------
kum chennai
kamesh bangalore
df2 作为,
name street
-------------------------------
kum 2nd str
kamesh 10th str
我需要添加包含城市和街道的名称。
输出数据框,例如,
df3 =
name street city
-----------------------------
kum 2nd str Chennai
kamesh 10th str bangalore.
如何使用 Scala 转换为 df3
按照以下方式加入他们
val df3 = df1.join(df2, Seq("name"))
默认它是一个内连接,您可以将join
类型定义为
val df3 = df1.join(df2, Seq("name"), "inner")
你的输出应该是
+------+---------+--------+
|name |city |street |
+------+---------+--------+
|kum |chennai |2nd str |
|kamesh|bangalore|10th str|
+------+---------+--------+
你可以使用这个:
val df3 = df1.join(df2, df1("name").equalTo(df2("name")))
但它会显示两次连接键
val df4 = df1.join(df2, Seq("name"), "inner")
这只会显示一次连接密钥
如下代码:
在此处输入代码
import spark.implicits._
val df1: DataFrame = Seq(("kum","chennai"),("kamesh","bangalore")).toDF("name","city")
val df2: DataFrame = Seq(("kum","2nd str"),("kamesh","10th str")).toDF("name","street")
val df3 = df1.join(df2, df1("name").equalTo(df2("name")))
df3.show()
val df4 = df1.join(df2, Seq("name"), "inner")
df4.show()
结果:
+------+---------+------+--------+
| name| city| name| street|
+------+---------+------+--------+
| kum| chennai| kum| 2nd str|
|kamesh|bangalore|kamesh|10th str|
+------+---------+------+--------+
+------+---------+--------+
| name| city| street|
+------+---------+--------+
| kum| chennai| 2nd str|
|kamesh|bangalore|10th str|
+------+---------+--------+
我有四个数据框,
df1 作为,
name city
--------------------------------
kum chennai
kamesh bangalore
df2 作为,
name street
-------------------------------
kum 2nd str
kamesh 10th str
我需要添加包含城市和街道的名称。 输出数据框,例如, df3 =
name street city
-----------------------------
kum 2nd str Chennai
kamesh 10th str bangalore.
如何使用 Scala 转换为 df3
按照以下方式加入他们
val df3 = df1.join(df2, Seq("name"))
默认它是一个内连接,您可以将join
类型定义为
val df3 = df1.join(df2, Seq("name"), "inner")
你的输出应该是
+------+---------+--------+
|name |city |street |
+------+---------+--------+
|kum |chennai |2nd str |
|kamesh|bangalore|10th str|
+------+---------+--------+
你可以使用这个:
val df3 = df1.join(df2, df1("name").equalTo(df2("name")))
但它会显示两次连接键val df4 = df1.join(df2, Seq("name"), "inner")
这只会显示一次连接密钥 如下代码: 在此处输入代码
import spark.implicits._
val df1: DataFrame = Seq(("kum","chennai"),("kamesh","bangalore")).toDF("name","city")
val df2: DataFrame = Seq(("kum","2nd str"),("kamesh","10th str")).toDF("name","street")
val df3 = df1.join(df2, df1("name").equalTo(df2("name")))
df3.show()
val df4 = df1.join(df2, Seq("name"), "inner")
df4.show()
结果:
+------+---------+------+--------+
| name| city| name| street|
+------+---------+------+--------+
| kum| chennai| kum| 2nd str|
|kamesh|bangalore|kamesh|10th str|
+------+---------+------+--------+
+------+---------+--------+
| name| city| street|
+------+---------+--------+
| kum| chennai| 2nd str|
|kamesh|bangalore|10th str|
+------+---------+--------+