Spark Scala Dataframe 列位置

Question

我有一个数据框，我在其中执行拖放和连接以更改列的值。执行此更改后，数据框的位置会更改，我会根据 table 动态构建模式。由于数据框和模式不匹配插入失败。

示例：

df = select 'yes' as x,a, b, c, d from aaaa, bbbb 
originaldf = select a, b, c, d from aaaa
temp1 = df.drop(x)
join = originaldf.except(tempdf)
temp2 = join.drop(c)
temp2.withColumn('c', df('x'))

我现在将应用 temp2 的架构，但 temp2 现在变为 c、a、b、d，而不是 a、b、c、d。有没有办法在 DataFrame 或其他任何地方重新排列它？

谢谢

Answer 1

就select:

>>> temp2.withColumn('c', df('x')).select("a", "b", "c", "d")

或

>>> temp3 = temp2.withColumn('c', df('x'))
>>> temp3.select(sorted(temp3.columns))

Spark Scala Dataframe 列位置

Spark Scala Dataframe columns position

apache-spark

apache-spark-sql

spark-dataframe