如何在 pandas 数据框的单元格中的列表中插入字符串值？

Question

我正在尝试使用数据集中的虚拟变量创建标签。我创建了一个列“Tags_col”，每次我的嵌套 for 循环遍历每一行时，如果某个类别有 1，我希望该类别包含在 [=24 的列表中=] 每一行。

像这样：

Dog   Cat   Rabbit   Tags_col
 0     1      1      ['Cat','Rabbit']
 1     0      0      ['Dog']

到目前为止我有这个：

for x in range(len(df)):
   for col in df.columns:
       if df.loc[x,col] == 1:
           df.loc[x, "Tags_col"] = col

但是，这只是附加了 for 循环在 Tags_col 中找到的第一个类别。

谢谢。

Answer 1

通过比较 1 和从列名称创建的过滤器数组来使用布尔值 DataFrame 的列表理解：

cols = df.columns.to_numpy()
df['Tags_col'] = [list(cols[x]) for x in df.eq(1).to_numpy()]
print (df)

   Dog  Cat  Rabbit       Tags_col
0    0    1       1  [Cat, Rabbit]
1    1    0       0          [Dog]

如果性能不重要使用DataFrame.apply:

df['Tags_col'] = df.apply(lambda x: list(x.index[x==1]), axis=1)
print (df)
   Dog  Cat  Rabbit       Tags_col
0    0    1       1  [Cat, Rabbit]
1    1    0       0          [Dog]

如何在 pandas 数据框的单元格中的列表中插入字符串值？

How to insert a string value in a list that is in a cell in pandas dataframe?

python

pandas

data-cleaning