如何在 spark scala 的 header 的所有列中附加常量

how to append cosntant in all columns of header in spark scala

例如这是我现有的 header

DataPartition|^|TimeStamp|^|Source.organizationId|^|Source.sourceId|^|FilingDateTime|^|SourceTypeCode|^|DocumentId|^|Dcn|^|DocFormat|^|StatementDate|^|IsFilingDateTimeEstimated|^|ContainsPreliminaryData|^|CapitalChangeAdjustmentDate|^|CumulativeAdjustmentFactor|^|ContainsRestatement|^|FilingDateTimeUTCOffset|^|ThirdPartySourceCode|^|ThirdPartySourcePriority|^|SourceTypeId|^|ThirdPartySourceCodeId|^|FFAction|!|

我想创建 header 如下所示

DataPartition_1|^|TimeStamp|^|Source.organizationId|^|Source.sourceId|^|FilingDateTime_1|^|SourceTypeCode_1|^|DocumentId_1|^|Dcn_1|^|DocFormat_1|^|StatementDate_1|^|IsFilingDateTimeEstimated_1|^|ContainsPreliminaryData_1|^|CapitalChangeAdjustmentDate_1|^|CumulativeAdjustmentFactor_1|^|ContainsRestatement_1|^|FilingDateTimeUTCOffset_1|^|ThirdPartySourceCode_1|^|ThirdPartySourcePriority_1|^|SourceTypeId_1|^|ThirdPartySourceCodeId_1|^|FFAction_1

除了列 TimeStamp|^|Source.organizationId|^|Source.sourceId 我想在所有 header 列中附加 _1

我已经通过使用 with withColumn 完成了它,但是我必须对所有列使用它。

有没有像使用 foldLeft 这样简单的方法?

首先,您需要定义要跳过的列的列表:

val columnsToAvoid = List("TimeStamp","Source.organizationId","Source.sourceId")

然后你可以 foldLeft 在 dataFrame 的列列表(由 df.columns 给出)上重命名它不包含在 columnsToAvoid 列表中的每一列,否则返回未更改的 dataFrame。

df.columns.foldLeft(df)((acc, elem) => 
                     if (columnsToAvoid.contains(elem)) acc 
                     else acc.withColumnRenamed(elem, elem+"_1"))

这里有一个简单的例子:

原DF

+-----+------+-----------+
| word| value|  TimeStamp|
+-----+------+-----------+
|wordA|valueA|45435345435|
|wordB|valueB|  454244345|
|wordC|valueC|32425425435|
+-----+------+-----------+

操作:

df.columns.foldLeft(df)((acc, elem) => if (columnsToAvoid.contains(elem)) acc else acc.withColumnRenamed(elem, elem+"_1")).show

结果:

+------+-------+-----------+
|word_1|value_1|  TimeStamp|
+------+-------+-----------+
| wordA| valueA|45435345435|
| wordB| valueB|  454244345|
| wordC| valueC|32425425435|
+------+-------+-----------+