由于 .在 spark 的列名中
Extracting value from data frame thorws error because of the . in the column name in spark
这是我现有的数据框
+------------------+-------------------------+-----------+---------------+-------------------------+---------------------------+------------------------+--------------------------+---------------+-----------+----------------+-----------------+----------------------+--------------------------+-----------+--------------------+-----------+--------------------------------------------------------------------------------------------+-----------------------+------------------+-----------------------------+-----------------------+----------------------------------+
|DataPartition |TimeStamp |_lineItemId|_organizationId|fl:FinancialConceptGlobal|fl:FinancialConceptGlobalId|fl:FinancialConceptLocal|fl:FinancialConceptLocalId|fl:InstrumentId|fl:IsCredit|fl:IsDimensional|fl:IsRangeAllowed|fl:IsSegmentedByOrigin|fl:SegmentGroupDescription|fl:Segments|fl:StatementTypeCode|FFAction|!||LineItemName |LineItemName.languageId|LocalLanguageLabel|LocalLanguageLabel.languageId|SegmentChildDescription|SegmentChildDescription.languageId|
+------------------+-------------------------+-----------+---------------+-------------------------+---------------------------+------------------------+--------------------------+---------------+-----------+----------------+-----------------+----------------------+--------------------------+-----------+--------------------+-----------+--------------------------------------------------------------------------------------------+-----------------------+------------------+-----------------------------+-----------------------+----------------------------------+
|SelfSourcedPrivate|2017-11-02T10:23:59+00:00|3 |4298009288 |XTOT |3016350 |null |null |null |true |false |false |false |null |null |BAL |I|!| |Total Assets |505074 |null |null |null |null |
这是上面数据框的模式
root
|-- DataPartition: string (nullable = true)
|-- TimeStamp: string (nullable = true)
|-- _lineItemId: long (nullable = true)
|-- _organizationId: long (nullable = true)
|-- fl:FinancialConceptGlobal: string (nullable = true)
|-- fl:FinancialConceptGlobalId: long (nullable = true)
|-- fl:FinancialConceptLocal: string (nullable = true)
|-- fl:FinancialConceptLocalId: long (nullable = true)
|-- fl:InstrumentId: long (nullable = true)
|-- fl:IsCredit: boolean (nullable = true)
|-- fl:IsDimensional: boolean (nullable = true)
|-- fl:IsRangeAllowed: boolean (nullable = true)
|-- fl:IsSegmentedByOrigin: boolean (nullable = true)
|-- fl:SegmentGroupDescription: string (nullable = true)
|-- fl:Segments: struct (nullable = true)
| |-- fl:SegmentSequence: struct (nullable = true)
| | |-- _VALUE: long (nullable = true)
| | |-- _segmentId: long (nullable = true)
|-- fl:StatementTypeCode: string (nullable = true)
|-- FFAction|!|: string (nullable = true)
|-- LineItemName: string (nullable = true)
|-- LineItemName.languageId: long (nullable = true)
|-- LocalLanguageLabel: string (nullable = true)
|-- LocalLanguageLabel.languageId: long (nullable = true)
|-- SegmentChildDescription: string (nullable = true)
|-- SegmentChildDescription.languageId: long (nullable = true)
我想使用以下代码重命名数据框的 header 列。
val temp = dfTypeNew.select(dfTypeNew.columns.filter(x => !x.equals("fl:Segments")).map(x => col(x).as(x.replace("_", "LineItem_").replace("fl:", ""))): _*)
当我这样做时,出现以下错误
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Can't extract value from LineItemName#368: need struct type but got
string;
当我在没有 .
的情况下重命名我的列时,我能够提取
出现错误是因为 (.)
点用于访问 struct
字段
要读取具有列名的字段,请使用反引号,如下所示
val df = Seq(
("a","b","c"),
("a","b","c")
).toDF("x", "y", "z.z")
df.select("x", "`z.z`").show(false)
输出
+---+---+
|a |c.c|
+---+---+
|a |c |
|a |c |
+---+---+
希望对您有所帮助!
由 Ramesh 编辑
@Anupam,您所要做的就是使用 Shankar 在您的代码中建议的上述技术
val temp = dfTypeNew.select(dfTypeNew.columns.filter(x => !x.equals("fl:Segments")).map(x => col(s"`${x}`").as(x.replace("_", "LineItem_").replace("fl:", ""))): _*)
就这些了。
这是我现有的数据框
+------------------+-------------------------+-----------+---------------+-------------------------+---------------------------+------------------------+--------------------------+---------------+-----------+----------------+-----------------+----------------------+--------------------------+-----------+--------------------+-----------+--------------------------------------------------------------------------------------------+-----------------------+------------------+-----------------------------+-----------------------+----------------------------------+
|DataPartition |TimeStamp |_lineItemId|_organizationId|fl:FinancialConceptGlobal|fl:FinancialConceptGlobalId|fl:FinancialConceptLocal|fl:FinancialConceptLocalId|fl:InstrumentId|fl:IsCredit|fl:IsDimensional|fl:IsRangeAllowed|fl:IsSegmentedByOrigin|fl:SegmentGroupDescription|fl:Segments|fl:StatementTypeCode|FFAction|!||LineItemName |LineItemName.languageId|LocalLanguageLabel|LocalLanguageLabel.languageId|SegmentChildDescription|SegmentChildDescription.languageId|
+------------------+-------------------------+-----------+---------------+-------------------------+---------------------------+------------------------+--------------------------+---------------+-----------+----------------+-----------------+----------------------+--------------------------+-----------+--------------------+-----------+--------------------------------------------------------------------------------------------+-----------------------+------------------+-----------------------------+-----------------------+----------------------------------+
|SelfSourcedPrivate|2017-11-02T10:23:59+00:00|3 |4298009288 |XTOT |3016350 |null |null |null |true |false |false |false |null |null |BAL |I|!| |Total Assets |505074 |null |null |null |null |
这是上面数据框的模式
root
|-- DataPartition: string (nullable = true)
|-- TimeStamp: string (nullable = true)
|-- _lineItemId: long (nullable = true)
|-- _organizationId: long (nullable = true)
|-- fl:FinancialConceptGlobal: string (nullable = true)
|-- fl:FinancialConceptGlobalId: long (nullable = true)
|-- fl:FinancialConceptLocal: string (nullable = true)
|-- fl:FinancialConceptLocalId: long (nullable = true)
|-- fl:InstrumentId: long (nullable = true)
|-- fl:IsCredit: boolean (nullable = true)
|-- fl:IsDimensional: boolean (nullable = true)
|-- fl:IsRangeAllowed: boolean (nullable = true)
|-- fl:IsSegmentedByOrigin: boolean (nullable = true)
|-- fl:SegmentGroupDescription: string (nullable = true)
|-- fl:Segments: struct (nullable = true)
| |-- fl:SegmentSequence: struct (nullable = true)
| | |-- _VALUE: long (nullable = true)
| | |-- _segmentId: long (nullable = true)
|-- fl:StatementTypeCode: string (nullable = true)
|-- FFAction|!|: string (nullable = true)
|-- LineItemName: string (nullable = true)
|-- LineItemName.languageId: long (nullable = true)
|-- LocalLanguageLabel: string (nullable = true)
|-- LocalLanguageLabel.languageId: long (nullable = true)
|-- SegmentChildDescription: string (nullable = true)
|-- SegmentChildDescription.languageId: long (nullable = true)
我想使用以下代码重命名数据框的 header 列。
val temp = dfTypeNew.select(dfTypeNew.columns.filter(x => !x.equals("fl:Segments")).map(x => col(x).as(x.replace("_", "LineItem_").replace("fl:", ""))): _*)
当我这样做时,出现以下错误
Exception in thread "main" org.apache.spark.sql.AnalysisException: Can't extract value from LineItemName#368: need struct type but got string;
当我在没有 .
的情况下重命名我的列时,我能够提取
出现错误是因为 (.)
点用于访问 struct
字段
要读取具有列名的字段,请使用反引号,如下所示
val df = Seq(
("a","b","c"),
("a","b","c")
).toDF("x", "y", "z.z")
df.select("x", "`z.z`").show(false)
输出
+---+---+
|a |c.c|
+---+---+
|a |c |
|a |c |
+---+---+
希望对您有所帮助!
由 Ramesh 编辑
@Anupam,您所要做的就是使用 Shankar 在您的代码中建议的上述技术
val temp = dfTypeNew.select(dfTypeNew.columns.filter(x => !x.equals("fl:Segments")).map(x => col(s"`${x}`").as(x.replace("_", "LineItem_").replace("fl:", ""))): _*)
就这些了。