Python/Pandas 中按位运算符的 any() 和 all() 类似物

Question

我有一个包含 "Category" 和 "Total" 列的 pandas DataFrame。可以有 4 个不同的类别：A、B、C、D。我以字典的形式给出了每个类别的分界点值。我需要排除总计超过相应切点的所有条目。这很好用：

cat = weekly_units['Category']
total = weekly_units['Total']
weekly_units = weekly_units[(cat == 'A') & (total <= cutpoints['A'])
                          | (cat == 'B') & (total <= cutpoints['B'])
                          | (cat == 'C') & (total <= cutpoints['C'])
                          | (cat == 'D') & (total <= cutpoints['D'])]

但我发现它很湿而且不符合 pythonic。有没有办法写这样的东西？

weekly_units = weekly_units[any([(cat == k) & (total <= v) for k, v in cutpoints.items()])]

Answer 1

是的。您要找的是 numpy.logical_or:

conditions = [(cat == k) & (total <= v) for k, v in cutpoints.items()]
weekly_units = weekly_units[np.logical_or.reduce(conditions)]

Answer 2

假设您的 Category 列实际上是 CategoricalDtype，您还可以：

weekly_units[total <= cat.cat.rename_categories(cutpoints).astype(float)]

Answer 3

这是干的，简单明了：

matched = False  # or matched = pd.Series(False, index=weekly_units.index)
for cat, cutpoint in cutpoints.items()
    matched |= ((weekly_units['Category'] == cat) & (weekly_units['Total'] <= cutpoint))
weekly_units = weekly_units[matched]

请注意，这跟在 official advice 到 "Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable."

之后

还有一个使用 reduce() 的标准库方法，但正如所承诺的那样，它的可读性较差：

import functools, operator
matched = functools.reduce(
    operator.__or__,  # or lambda x, y: x | y
    (
        (weekly_units['Category'] == cat) & (weekly_units['Total'] <= cut)
        for cat, cut in cutpoints.items()
    )
)
weekly_units = weekly_units[matched]

Python/Pandas 中按位运算符的 any() 和 all() 类似物

any() and all() analogues for bitwise operators in Python/Pandas

python

numpy

bitwise-operators

pandas