如何根据条件从列表 [Map] 创建数据框

How to create dataframe from list[Map] based on condition

我有一个名为 DF1 的数据框,如下所示。

DF1:

srcColumnZ|srcCoulmnY|srcCoulmnR| 
+---------+----------+----------+
|John     |Non Hf    |New york  |
|Steav    |Non Hf    |Mumbai    |
|Ram      |HF        |Boston    |

还有一个映射列表,其中包含源到目标列的映射,如下所示。

List(Map(targetColumn -> columnNameX, sourceColumn -> List(srcColumnX, srcColumnY, srcColumnZ, srcColumnP, srcColumnQ, srcColumnR)), Map(targetColumn -> columnNameY, sourceColumn -> List(srcColumnY)), Map(targetColumn -> columnNameZ, selectvalue -> 5))

我想根据上面的地图列表创建一个数据框,在那个数据框中我需要 columnNameX、columnNameY、columnNameZ 作为列(根据上面的列表),这些列的值将基于 sourceColumn即,如果 sourceColumn 像 List(srcColumnX, srcColumnY, srcColumnZ, srcColumnP, srcColumnQ, srcColumnR)) 一样存在,那么它将在 DF1 中一一检查所有列,并且只要第一列匹配,它就会将该列的所有值移动到目标中列和下一个目标列相同。如果 selectvalue 存在而不是源列,它将将该值硬编码到整个列中。即:在上面的目标列(columnNameZ)列表中,选择值存在 5

下面是预期的输出。

columnNameX|columnNameY|columnNameZ| 
+----------+-----------+-----------+
|John      |Non Hf     |5          |
|Steav     |Non Hf     |5          |
|Ram       |HF         |5          |

这里的主要事情是从给定的 map 生成一个 query list,你可以像下面那样做

//Input DF
val df=Seq(("John","Non Hf","New york"),("Steav","Non Hf","Mumbai"),("Ram","HF","Boston")).toDF("srcColumnZ", "srcColumnY", "srcColumnR")

//Input List

val mapList=List(Map("targetColumn" -> "columnNameX", "sourceColumn" -> List("srcColumnX", "srcColumnY", "srcColumnZ", "srcColumnP", "srcColumnQ", "srcColumnR")), Map("targetColumn" -> "columnNameY", "sourceColumn" -> List("srcColumnY")), Map("targetColumn" -> "columnNameZ", "selectvalue" -> 5))

//Get all the columns of df as list

val dfCols=df.columns.toList

//Then generate query list like below

val query = mapList.map { mp =>
            if (mp.contains("sourceColumn")) {
                val srcColumn = mp.getOrElse("sourceColumn", "sourceColumn key not found").toString.replace("List(", "").replace(")", "").split(",").map(_.trim).toList
                val srcCol = srcColumn.filter(dfCols.contains(_)).head
                df.col(srcCol.toString).alias(mp.getOrElse("targetColumn", "No Target column found").toString)
            } else {
                lit(mp.getOrElse("selectvalue", "No Target column found").toString.replace("(", "").replace(")", "").trim).alias(mp.getOrElse("targetColumn", "No Target column found").toString)
            }
        }

//Finally , fire the query

df.select(query:_*).show

//Sample output:

+-----------+-----------+-----------+
|columnNameX|columnNameY|columnNameZ|
+-----------+-----------+-----------+
|     Non Hf|     Non Hf|          5|
|     Non Hf|     Non Hf|          5|
|         HF|         HF|          5|
+-----------+-----------+-----------+