如何根据条件从列表 [Map] 创建数据框
How to create dataframe from list[Map] based on condition
我有一个名为 DF1 的数据框,如下所示。
DF1:
srcColumnZ|srcCoulmnY|srcCoulmnR|
+---------+----------+----------+
|John |Non Hf |New york |
|Steav |Non Hf |Mumbai |
|Ram |HF |Boston |
还有一个映射列表,其中包含源到目标列的映射,如下所示。
List(Map(targetColumn -> columnNameX, sourceColumn -> List(srcColumnX, srcColumnY, srcColumnZ, srcColumnP, srcColumnQ, srcColumnR)), Map(targetColumn -> columnNameY, sourceColumn -> List(srcColumnY)), Map(targetColumn -> columnNameZ, selectvalue -> 5))
我想根据上面的地图列表创建一个数据框,在那个数据框中我需要 columnNameX、columnNameY、columnNameZ 作为列(根据上面的列表),这些列的值将基于 sourceColumn即,如果 sourceColumn 像 List(srcColumnX, srcColumnY, srcColumnZ, srcColumnP, srcColumnQ, srcColumnR)) 一样存在,那么它将在 DF1 中一一检查所有列,并且只要第一列匹配,它就会将该列的所有值移动到目标中列和下一个目标列相同。如果 selectvalue 存在而不是源列,它将将该值硬编码到整个列中。即:在上面的目标列(columnNameZ)列表中,选择值存在 5
下面是预期的输出。
columnNameX|columnNameY|columnNameZ|
+----------+-----------+-----------+
|John |Non Hf |5 |
|Steav |Non Hf |5 |
|Ram |HF |5 |
这里的主要事情是从给定的 map
生成一个 query
list
,你可以像下面那样做
//Input DF
val df=Seq(("John","Non Hf","New york"),("Steav","Non Hf","Mumbai"),("Ram","HF","Boston")).toDF("srcColumnZ", "srcColumnY", "srcColumnR")
//Input List
val mapList=List(Map("targetColumn" -> "columnNameX", "sourceColumn" -> List("srcColumnX", "srcColumnY", "srcColumnZ", "srcColumnP", "srcColumnQ", "srcColumnR")), Map("targetColumn" -> "columnNameY", "sourceColumn" -> List("srcColumnY")), Map("targetColumn" -> "columnNameZ", "selectvalue" -> 5))
//Get all the columns of df as list
val dfCols=df.columns.toList
//Then generate query list like below
val query = mapList.map { mp =>
if (mp.contains("sourceColumn")) {
val srcColumn = mp.getOrElse("sourceColumn", "sourceColumn key not found").toString.replace("List(", "").replace(")", "").split(",").map(_.trim).toList
val srcCol = srcColumn.filter(dfCols.contains(_)).head
df.col(srcCol.toString).alias(mp.getOrElse("targetColumn", "No Target column found").toString)
} else {
lit(mp.getOrElse("selectvalue", "No Target column found").toString.replace("(", "").replace(")", "").trim).alias(mp.getOrElse("targetColumn", "No Target column found").toString)
}
}
//Finally , fire the query
df.select(query:_*).show
//Sample output:
+-----------+-----------+-----------+
|columnNameX|columnNameY|columnNameZ|
+-----------+-----------+-----------+
| Non Hf| Non Hf| 5|
| Non Hf| Non Hf| 5|
| HF| HF| 5|
+-----------+-----------+-----------+
我有一个名为 DF1 的数据框,如下所示。
DF1:
srcColumnZ|srcCoulmnY|srcCoulmnR|
+---------+----------+----------+
|John |Non Hf |New york |
|Steav |Non Hf |Mumbai |
|Ram |HF |Boston |
还有一个映射列表,其中包含源到目标列的映射,如下所示。
List(Map(targetColumn -> columnNameX, sourceColumn -> List(srcColumnX, srcColumnY, srcColumnZ, srcColumnP, srcColumnQ, srcColumnR)), Map(targetColumn -> columnNameY, sourceColumn -> List(srcColumnY)), Map(targetColumn -> columnNameZ, selectvalue -> 5))
我想根据上面的地图列表创建一个数据框,在那个数据框中我需要 columnNameX、columnNameY、columnNameZ 作为列(根据上面的列表),这些列的值将基于 sourceColumn即,如果 sourceColumn 像 List(srcColumnX, srcColumnY, srcColumnZ, srcColumnP, srcColumnQ, srcColumnR)) 一样存在,那么它将在 DF1 中一一检查所有列,并且只要第一列匹配,它就会将该列的所有值移动到目标中列和下一个目标列相同。如果 selectvalue 存在而不是源列,它将将该值硬编码到整个列中。即:在上面的目标列(columnNameZ)列表中,选择值存在 5
下面是预期的输出。
columnNameX|columnNameY|columnNameZ|
+----------+-----------+-----------+
|John |Non Hf |5 |
|Steav |Non Hf |5 |
|Ram |HF |5 |
这里的主要事情是从给定的 map
生成一个 query
list
,你可以像下面那样做
//Input DF
val df=Seq(("John","Non Hf","New york"),("Steav","Non Hf","Mumbai"),("Ram","HF","Boston")).toDF("srcColumnZ", "srcColumnY", "srcColumnR")
//Input List
val mapList=List(Map("targetColumn" -> "columnNameX", "sourceColumn" -> List("srcColumnX", "srcColumnY", "srcColumnZ", "srcColumnP", "srcColumnQ", "srcColumnR")), Map("targetColumn" -> "columnNameY", "sourceColumn" -> List("srcColumnY")), Map("targetColumn" -> "columnNameZ", "selectvalue" -> 5))
//Get all the columns of df as list
val dfCols=df.columns.toList
//Then generate query list like below
val query = mapList.map { mp =>
if (mp.contains("sourceColumn")) {
val srcColumn = mp.getOrElse("sourceColumn", "sourceColumn key not found").toString.replace("List(", "").replace(")", "").split(",").map(_.trim).toList
val srcCol = srcColumn.filter(dfCols.contains(_)).head
df.col(srcCol.toString).alias(mp.getOrElse("targetColumn", "No Target column found").toString)
} else {
lit(mp.getOrElse("selectvalue", "No Target column found").toString.replace("(", "").replace(")", "").trim).alias(mp.getOrElse("targetColumn", "No Target column found").toString)
}
}
//Finally , fire the query
df.select(query:_*).show
//Sample output:
+-----------+-----------+-----------+
|columnNameX|columnNameY|columnNameZ|
+-----------+-----------+-----------+
| Non Hf| Non Hf| 5|
| Non Hf| Non Hf| 5|
| HF| HF| 5|
+-----------+-----------+-----------+