如何从 Spark DataFrame 中删除列表中给出的多个列名？

Question

我有一个基于 n 值创建的动态列表。

n = 3
drop_lst = ['a' + str(i) for i in range(n)]
df.drop(drop_lst)

但是上面的方法不起作用。

注:

我的用例需要动态列表。

如果我只是在没有列表的情况下执行以下操作

df.drop('a0','a1','a2')

如何使 drop 函数与列表一起使用？

Spark 2.2 好像没有这个功能。有没有办法让它在不使用 select() 的情况下工作？

Answer 1

您可以使用 * 运算符将列表的内容作为参数传递给 drop():

df.drop(*drop_lst)

Answer 2

您可以使用 drop(*cols) 2 种方式。

查看官方文档DataFrame.drop

Answer 3

您可以将列名称作为逗号分隔的列表，例如

df.drop("col1","col11","col21")

Answer 4

这是在 Scala 中删除指定数量的连续列的方法：

val ll = dfwide.schema.names.slice(1,5)
dfwide.drop(ll:_*).show

slice 有两个参数 star index 和 end index。

Answer 5

使用简单循环：

for c in drop_lst:
   df = df.drop(c)

How to drop multiple column names given in a list from Spark DataFrame?