Select 列基于 pandas 中的行值并使用列索引进行子集化

Question

我有以下数据框，其中前三列具有不应更改的特定名称 ('col1' - 'col3')，编号列的范围为 3 - 7。

data = [[0, 0.5, 0.5, 1, 0, 1, 0, 0],
        [1, 0.5, 0.5, 1, 1, 0, 1, 1],
        [2, 0.5, 0.5, 1, 1, 0, 1, 1]]
df = pd.DataFrame(data)
df = df.rename(columns = {0: 'Col1', 1:'Col2', 2: 'Col3'})

我想select所有编号列（列索引3-7）在第一行.

df2 = df.loc[0, df.iloc[0, 3:] == 1]

这会引发以下错误：AssertionError

之后，我想使用 df2 中的索引，这些索引表示满足第 1 行中值 1 标准的列（例如第 3 列和第 5 列）用于 select 第二行中的那些列行并检查它们是否也具有值 1。

df3 = df.loc[1, df.iloc[1, df2.index] == 1]

这会引发以下错误：IndexError: .iloc 需要数字索引器，得到 [3 5]

最终的预期输出应该是只有第 2 行的列索引 3 满足值 1 的条件，因为从第 1 行只有列索引 3 和 5 具有值 1。

我该怎么做？

Answer 1

使用：

df1 = df.iloc[:, 3:]
fin = df1.columns[(df1.iloc[0] == 1) & (df.iloc[1, 3:] == 1)]
print (fin)
Index([3], dtype='object')

原解：

out = df.columns[3:][df.iloc[0, 3:] == 1]
s = df.loc[1, out]

fin = s.index[s == 1]
print (fin)
Index([3], dtype='object')

Answer 2

一个选项：

# first row of columns to test (could be a fixed list)
cols = df.loc[0,3:7]
# if not 1, then drop
df2 = df.drop(cols[cols.ne(1)].index, axis=1)

输出：

   Col1  Col2  Col3  3  5
0     0   0.5   0.5  1  1
1     1   0.5   0.5  1  0
2     2   0.5   0.5  1  0

备选

只是获取包含 1 的列的名称：

cols = df.loc[0,3:7] # first row, columns 3 to 7
# or with iloc
# cols = df.iloc[0,3:]

cols[cols.eq(1)].index
# Index([3, 5], dtype='object')

Select 列基于 pandas 中的行值并使用列索引进行子集化

Select columns based on row values in pandas and use the column indexes for subsetting

python

indexing

python-3.x

pandas

备选