引用 Pyspark DataFrame 中的列

Question

假设我有一个转换为数据框的单词列表

  -----
| word |
  -----
| cat  |
| bird |
| dog  |
| ...  |
  -----

然后我尝试计算字母数：

from pyspark.sql.functions import length

letter_count_df = words_df.select(length(words_df.word))

我知道这个结果只包含一个列的数据框。

如何在不使用 alias 的情况下引用 letter_count_df 的唯一列？

  -------------
| length(word) |
  -------------
|           3  |
|           4  |
|           3  |
|         ...  |
  -------------

Answer 1

姓名：

>>> letter_count_df.select(c)
DataFrame[length(word): int]

或列和名称：

>>> from pyspark.sql.functions import *
>>> letter_count_df.select(c))

其中 c 为常数：

>>> c = "length(word)"

或

>>> c = letter_count_df.columns[0]

Referencing columns in Pyspark DataFrame