如何获取值大于零的列名作为每行的多标签

Question

我有一个案例，我将标签名称作为 DataFrame 的列，其值为 0 或更多，如下所示。

.net    2007    actionscript-3  activerecord    air ajax
0   0   0   0   1   1   1
1   0   0   0   1   1   1
2   0   0   0   1   1   1
3   2   2   2   2   0   0
4   2   2   2   2   0   0
5   2   2   2   2   0   0

我的要求是将那些值大于零的列名放在一个单独的列中，每一行由 space 分隔，如下所示：

0   activerecord air ajax
1   activerecord air ajax
2   activerecord air ajax
3   .net 2007 actionscript-3 activerecord
4   .net 2007 actionscript-3 activerecord
5   .net 2007 actionscript-3 activerecord

示例：activerecord、air、ajax 列中的值在第一行中的值为 1，因此这 3 个应该出现在由 space 分隔的数据帧的一列中。

这是一个多标签分类的例子。

Answer 1

试试这个，

print(df.apply(lambda x: " ".join([k for k, v in x.iteritems() if v]), axis=1))

输出，

0                    activerecord air ajax
1                    activerecord air ajax
2                    activerecord air ajax
3    .net 2007 actionscript-3 activerecord
4    .net 2007 actionscript-3 activerecord
5    .net 2007 actionscript-3 activerecord

Answer 2

您可以尝试使用 df.apply

df.apply(lambda x:' '.join(x.index[x!=0]) , axis=1)

或者 df.T 是 df.transpose

的简写

df.T.apply(lambda x: ' '.join(x.index[x!=0]))

这是另一个技巧：使用 df.stack df.where to find values not equal to 0 and drop NaN using df.dropna and df.groupby

df = df.stack()
(df.where(df!=0).dropna().reset_index().
    groupby('level_0')['level_1'].apply(','.join))

level_0
0                    activerecord,air,ajax
1                    activerecord,air,ajax
2                    activerecord,air,ajax
3    .net,2007,actionscript-3,activerecord
4    .net,2007,actionscript-3,activerecord
5    .net,2007,actionscript-3,activerecord
Name: level_1, dtype: object

或者 df.groupby 允许按级别分组并使用 pd.MultiIndex.get_level_values 获取级别 1 索引值。

df = df.stack()
(df.where(df!=0).dropna().groupby(level=0).
    apply(lambda x:','.join(x.index.get_level_values(1))))

0                    activerecord,air,ajax
1                    activerecord,air,ajax
2                    activerecord,air,ajax
3    .net,2007,actionscript-3,activerecord
4    .net,2007,actionscript-3,activerecord
5    .net,2007,actionscript-3,activerecord
dtype: object

如何获取值大于零的列名作为每行的多标签

How to get the column names with value more than ZERO as the multilabels for each row

python

pandas

multilabel-classification

data-science