在 Pandas lambda 函数中访问组

Question

我有一个带有多索引的 Pandas 数据框。级别 0 是 'Strain'，级别 1 是 'JGI library.' 每个 'Strain' 都有几个与之关联的 'JGI library' 列。我想使用 lambda 函数应用 t 检验来比较两种不同的菌株。为了排除故障，我一直在使用 .iloc[0] 命令获取数据帧的一行。

row = pvalDf.iloc[0]
parent = 'LL1004'
child = 'LL345'
ttest_ind(row.groupby(level='Strain').get_group(parent), row.groupby(level='Strain').get_group(child))[1]

这按预期工作。现在我尝试将它应用于我的整个数据框

parent = 'LL1004'
child = 'LL345'
pvalDf = countsDf4.apply(lambda row: ttest_ind(row.groupby(level='Strain').get_group(parent), row.groupby(level='Strain').get_group(child))[1])

现在我收到一条错误消息，说 "ValueError: ('level name Strain is not the name of the index', 'occurred at index (LL1004, BCHAC)')"

'LL1004' 是一个 'Strain,' 但 Pandas 似乎并没有意识到这一点。看起来 multiindex 可能没有正确传递给 lambda 函数？有没有比使用 .iloc[0] 更好的方法来解决 lambda 函数的问题？

我在 Github https://github.com/danolson1/pandas_ttest

上放了一份我的 Jupyter notebook 和一个带有 countsDf4 数据框的 excel 文件

谢谢，旦

Answer 1

怎么样，更简单：

pvalDf = countsDf4.apply(lambda row: ttest_ind(row[parent], row[child]), axis=1)

我已经在你的笔记本上测试过了，它可以工作。

您的问题是 DataFrame.apply() 默认情况下将函数应用于每个列，而不是每一行。因此，您需要指定 axis=1 参数来覆盖默认行为并逐行应用函数。

此外，当您可以简单地通过 row[x] 对列组进行索引时，没有理由使用 row.groupby(level='Strain').get_group(x)。 :)

在 Pandas lambda 函数中访问组

Accessing groups in Pandas lambda function

python

lambda

apply

multi-index

pandas