如何在 pandas 中使用 apply 进行嵌套循环

Question

我有这样一个数据框：

text,                pos
No thank you.        [(No, DT), (thank, NN), (you, PRP)]
They didn't respond  [(They, PRP), (didn't, VBP), (respond, JJ)]

我想在 pos 上应用一个函数并将结果保存在新列中。所以输出看起来像这样：

text,                pos                                           score
No thank you.        [(No, DT), (thank, NN), (you, PRP)]        [[0.0, 0.0, 1.0], [], [0.5, 0.0, 0.45]]
They didn't respond  [(They, PRP), (didn, VBP), (respond, JJ)]  [[0.0, 0.0, 1.0], [], [0.75, 0.0, 0.25]]

所以函数return为列表中的每个元组创建一个列表（但函数的实现不是这里的重点，为此我只调用 get_sentiment)。我可以使用嵌套循环来做到这一点，但我不喜欢它。我想使用更 pythonic 和 Pandas Dataframe 方式来做到这一点：

这是我目前尝试过的方法：

df['score'] = df['pos'].apply(lambda k: [get_sentiment(x,y) for j in k for (x,y) in j])

但是，它引发了这个错误：

ValueError: too many values to unpack (expected 2)

so 中有几个问题，但答案在 R 中。

为了更清楚：

get_sentiment函数是NLTK中的一个函数，它为每个单词分配一个分数列表（列表是[positive score, negative score, objectivity score]）。总的来说，我需要在我的 Dataframe 的 pos 列之上应用该函数。

Answer 1

你的情况

df['score'] = df['pos'].apply(lambda k: [get_sentiment(j[0],j[1]) for j in k ])

Answer 2

让我们从等式中取出 Pandas 并创建一个 minimal reproducible example 问题 - 这与 lambda 本身有关：

def mock_sentiment(word, pos):
    return len(word) * 0.1, 0, len(pos) * 0.1

data = [('No', 'DT'), ('thank', 'NN'), ('you', 'PRP')]

[mock_sentiment(x, y) for j in data for (x,y) in j] # reproduces the error

问题是每个 j in data（例如 ('No', 'DT')）都是一个 单元组 ，我们要将其解压缩为 x, y 值。通过迭代 in j，我们得到单独的字符串（'No' 和 'DT'），然后我们尝试将其解压缩为 x 和 y。这恰好适用于 'No' 和 'DT'，但不适用于其他长度的字符串 - 即便如此，它也不是预期的结果。

由于j已经是我们要解包的元组，我们要做的是在那里解包，使用(x, y) 而不是 j 用于迭代，并且没有任何嵌套理解：

[mock_sentiment(x, y) for (x, y) in data] # works as expected

因此，that 是我们希望 lambda 在真实代码中返回给 Pandas 的东西（替换回你的名字和真实情绪函数）：

df['score'] = df['pos'].apply(lambda k: [get_sentiment(x, y) for (x, y) in k])

如何在 pandas 中使用 apply 进行嵌套循环

how to do nested loop using apply in pandas

python

nested-loops

pandas