Pandas 方法链接和添加新列

Question

我正在尝试在 Pandas 中创建一个名为“折旧”的新专栏，如下所示：

(omc_e.query('GL_account == "740190" and Posting_date > "2021-06-30"')
    .groupby(['Business_segment_item'], as_index=False)
    ['Amount_DKK']
    .sum()
    .assign( Depreciation = lambda x: 0 if x.Business_segment_item == "" else x.Amount_DKK)
)

但是我在尝试运行时遇到错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我当然可以制作一个新的数据框“测试”和运行同样的方法，然后就可以了：

test[['Business_segment_item', 'Amount_DKK']].apply(lambda x: 0 if x.Business_segment_item == "" else x.Amount_DKK, axis = 1)

所以一定是我的方法链接导致了问题，但我可以避免这种情况并仍然使用方法链接以避免大量临时数据帧吗？

Answer 1

使用numpy.where:

(omc_e.query('GL_account == "740190" and Posting_date > "2021-06-30"')
    .groupby(['Business_segment_item'], as_index=False)
    ['Amount_DKK']
    .sum()
    .assign( Depreciation = lambda x: np.where(x.Business_segment_item == "" , 0, x.Amount_DKK))
)

Answer 2

另一种可能提供 cleaner 抽象的方法是使用 case_when function from pyjanitor，它复制了 SQL 函数的情况：

# pip install pyjanitor
import pandas as pd
import janitor
(omc_e.query('GL_account == "740190" and Posting_date > "2021-06-30"')
    .groupby(['Business_segment_item'], as_index=False)
    ['Amount_DKK']
    .sum()
    .case_when(lambda x: x.Business_segment_item == "", # condition
               0, # result if True
               lambda x: x.Amount_DKK, # result if False
               column_name = 'Depreciation')
)

你可以看看更多的例子here

Pandas 方法链接和添加新列

Pandas Method chaining and adding a new column

python

lambda

pandas