Python/Pandas 中按位运算符的 any() 和 all() 类似物
any() and all() analogues for bitwise operators in Python/Pandas
我有一个包含 "Category" 和 "Total" 列的 pandas DataFrame。可以有 4 个不同的类别:A、B、C、D。我以字典的形式给出了每个类别的分界点值。我需要排除总计超过相应切点的所有条目。这很好用:
cat = weekly_units['Category']
total = weekly_units['Total']
weekly_units = weekly_units[(cat == 'A') & (total <= cutpoints['A'])
| (cat == 'B') & (total <= cutpoints['B'])
| (cat == 'C') & (total <= cutpoints['C'])
| (cat == 'D') & (total <= cutpoints['D'])]
但我发现它很湿而且不符合 pythonic。
有没有办法写这样的东西?
weekly_units = weekly_units[any([(cat == k) & (total <= v) for k, v in cutpoints.items()])]
是的。您要找的是 numpy.logical_or
:
conditions = [(cat == k) & (total <= v) for k, v in cutpoints.items()]
weekly_units = weekly_units[np.logical_or.reduce(conditions)]
假设您的 Category
列实际上是 CategoricalDtype
,您还可以:
weekly_units[total <= cat.cat.rename_categories(cutpoints).astype(float)]
这是干的,简单明了:
matched = False # or matched = pd.Series(False, index=weekly_units.index)
for cat, cutpoint in cutpoints.items()
matched |= ((weekly_units['Category'] == cat) & (weekly_units['Total'] <= cutpoint))
weekly_units = weekly_units[matched]
请注意,这跟在 official advice 到 "Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable."
之后
还有一个使用 reduce()
的标准库方法,但正如所承诺的那样,它的可读性较差:
import functools, operator
matched = functools.reduce(
operator.__or__, # or lambda x, y: x | y
(
(weekly_units['Category'] == cat) & (weekly_units['Total'] <= cut)
for cat, cut in cutpoints.items()
)
)
weekly_units = weekly_units[matched]
我有一个包含 "Category" 和 "Total" 列的 pandas DataFrame。可以有 4 个不同的类别:A、B、C、D。我以字典的形式给出了每个类别的分界点值。我需要排除总计超过相应切点的所有条目。这很好用:
cat = weekly_units['Category']
total = weekly_units['Total']
weekly_units = weekly_units[(cat == 'A') & (total <= cutpoints['A'])
| (cat == 'B') & (total <= cutpoints['B'])
| (cat == 'C') & (total <= cutpoints['C'])
| (cat == 'D') & (total <= cutpoints['D'])]
但我发现它很湿而且不符合 pythonic。 有没有办法写这样的东西?
weekly_units = weekly_units[any([(cat == k) & (total <= v) for k, v in cutpoints.items()])]
是的。您要找的是 numpy.logical_or
:
conditions = [(cat == k) & (total <= v) for k, v in cutpoints.items()]
weekly_units = weekly_units[np.logical_or.reduce(conditions)]
假设您的 Category
列实际上是 CategoricalDtype
,您还可以:
weekly_units[total <= cat.cat.rename_categories(cutpoints).astype(float)]
这是干的,简单明了:
matched = False # or matched = pd.Series(False, index=weekly_units.index)
for cat, cutpoint in cutpoints.items()
matched |= ((weekly_units['Category'] == cat) & (weekly_units['Total'] <= cutpoint))
weekly_units = weekly_units[matched]
请注意,这跟在 official advice 到 "Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable."
之后还有一个使用 reduce()
的标准库方法,但正如所承诺的那样,它的可读性较差:
import functools, operator
matched = functools.reduce(
operator.__or__, # or lambda x, y: x | y
(
(weekly_units['Category'] == cat) & (weekly_units['Total'] <= cut)
for cat, cut in cutpoints.items()
)
)
weekly_units = weekly_units[matched]