从满足（字典？）要求的 PANDAS 数据框中获取行

Question

我想通过指定变量组 column==value 条件来过滤 pandas DataFrame 的行。

假设我们有这样一个玩具 DataFrame：

from itertools import product
from numpy.random import rand
df = pd.DataFrame([[i,j,k,rand()] for i,j,k,m in product(range(2), repeat=3)],
                    columns=['par1','par2','par3','val'])

有些行看起来像：

   par1  par2  par3       val
0     0     0     0  0.464625
1     0     0     1  0.481147
2     0     1     0  0.817992
3     0     1     1  0.639930
4     1     0     0  0.035160
5     1     0     1  0.549517
6     1     1     0  0.172746
7     1     1     1  0.855064

我想知道 select 某些行的最佳方式是指定一些行 column==value 条件不需要包括所有列，也不需要总是相同的列甚至相同数量的列。我认为 dict 是指定条件的一种相当自然的方式：

conditions = {'par1':1, 'par3':0}

在这种情况下，df.par2 的任何值都可以。

`df.isin()`

我知道 df.isin() 带有 dict 个参数以及 all(1) as shown in the docs（本节的最后一个代码块）。问题是未通过 df.isin() 调用中的条件的列中的值给出 False，因此对 all(1) 的后续调用给出了一个空的 DataFrame。（一个解决办法是添加所有缺失的列，并将所有可能的值作为标准，但这听起来不是一个好的解决方案）

`df.query()`

在写这个问题时，我想到了另一个尝试。这个看起来好多了：根据条件 dict.

自动构建查询

df.query(' & '.join(['({} == {})'.format(k,v)
                     for k,v in conditions.iteritems()]))

它按预期工作...

   par1  par2  par3       val
4     1     0     0  0.035160
6     1     1     0  0.172746

仍然，我不完全相信，我想知道是否有更多的 natural/proper/clear 方法来做到这一点... Pandas 是如此之大，我总是有我想念的印象正确的做事方式...:P

Answer 1

您可以制作一系列 conditions 和 select 只有那些列：

>>> df[(df[list(conditions)] == pd.Series(conditions)).all(axis=1)]
   par1  par2  par3       val
4     1     0     0  0.937192
6     1     1     0  0.536029

这是可行的，因为在我们制作系列后，它会将我们需要的方式与以下方式进行比较：

>>> df[list(conditions)]
   par1  par3
0     0     0
1     0     1
2     0     0
3     0     1
4     1     0
5     1     1
6     1     0
7     1     1
>>> df[list(conditions)] == pd.Series(conditions)
    par1   par3
0  False   True
1  False  False
2  False   True
3  False  False
4   True   True
5   True  False
6   True   True
7   True  False

从满足（字典？）要求的 PANDAS 数据框中获取行

Getting rows from a PANDAS dataframe that fulfill (a dictionary?) of requirements

python

pandas

`df.isin()`

`df.query()`