显示 pandas 中的列

Question

我在 pandas 中有一个术语 x 文档矩阵（由 CSV 制成），形式为：

cheese, milk, bread, butter
0,2,1,0
1,1,0,0
1,1,1,1
0,1,0,1

因此，如果我说“给我索引 1 和 2 处的列，其中给定行的值都 > 0”。

我想这样结束：

cheese, milk,
[omitted]
1,1
1,1
[omitted]

这样，我可以对 number of rows / number of documents 求和，得到一个频繁项集，即 (cheese, milk) --[2/4 support]

我已经按照单独的 Whosebug 线程中的指示尝试了这种方法：

fil_df.select([fil_df.columns[1] > 0 and fil_df.columns[2] > 0], [fil_df.columns[1], fil_df.columns[2]])

但遗憾的是它对我不起作用。我收到错误：

TypeError: unorderable types: str() > int()

我不知道如何解决这个问题，因为当我从 csv 制作数据框时，我无法使行的单元格成为 integers。

Answer 1

您可以使用 iloc with boolean indexing:

#get 1. and 2. columns
subset = df.iloc[:, [0,1]]
print (subset)
   cheese  milk
0       0     2
1       1     1
2       1     1
3       0     1

#mask
print ((subset > 0))
  cheese  milk
0  False  True
1   True  True
2   True  True
3  False  True

#get all values where True by rows
print ((subset > 0).all(1))
0    False
1     True
2     True
3    False
dtype: bool

#get first and second columns names
print (df.columns[[0,1]])
Index(['cheese', 'milk'], dtype='object')

print (df.ix[(subset > 0).all(1), df.columns[[0,1]]])
   cheese  milk
1       1     1
2       1     1

Answer 2

df.loc[[1, 2], df.loc[[1, 2]].gt(0).all()]

显示 pandas 中的列

Showing columns in pandas

python

indexing

multiple-columns

conditional-statements

pandas