如何在scala spark中将特定的数据框列与另一个数据框连接

how to join specific column of dataframe with another in scala spark

我有四个数据框,

df1 作为,

 name         city
--------------------------------
 kum        chennai
kamesh     bangalore

df2 作为,

name   street
-------------------------------
kum     2nd str
kamesh  10th str

我需要添加包含城市和街道的名称。 输出数据框,例如, df3 =

name     street    city
-----------------------------
kum       2nd str    Chennai
kamesh    10th str   bangalore.

如何使用 Scala 转换为 df3

按照以下方式加入他们

val df3 = df1.join(df2, Seq("name"))

默认它是一个内连接,您可以将join类型定义为

val df3 = df1.join(df2, Seq("name"), "inner")

你的输出应该是

+------+---------+--------+
|name  |city     |street  |
+------+---------+--------+
|kum   |chennai  |2nd str |
|kamesh|bangalore|10th str|
+------+---------+--------+

你可以使用这个:

  1. val df3 = df1.join(df2, df1("name").equalTo(df2("name"))) 但它会显示两次连接键

  2. val df4 = df1.join(df2, Seq("name"), "inner") 这只会显示一次连接密钥 如下代码: 在此处输入代码

import spark.implicits._ val df1: DataFrame = Seq(("kum","chennai"),("kamesh","bangalore")).toDF("name","city") val df2: DataFrame = Seq(("kum","2nd str"),("kamesh","10th str")).toDF("name","street") val df3 = df1.join(df2, df1("name").equalTo(df2("name"))) df3.show() val df4 = df1.join(df2, Seq("name"), "inner") df4.show()

结果:

+------+---------+------+--------+
|  name|     city|  name|  street|
+------+---------+------+--------+
|   kum|  chennai|   kum| 2nd str|
|kamesh|bangalore|kamesh|10th str|
+------+---------+------+--------+
+------+---------+--------+
|  name|     city|  street|
+------+---------+--------+
|   kum|  chennai| 2nd str|
|kamesh|bangalore|10th str|
+------+---------+--------+