将列 headers 更改为行中的值
change column headers to be values in rows
假设我有一个具有这种结构的数据集
pet_name doggo floofer puppo pupper
A None floofer None None
B doggo None None None
C None None puppo None
D None None None pupper
E doggo floofer None None
F None None puppo pupper
G None None None None
并且我想要一个名为 dog_stage 的新列,其中包含变量(doggo、floofer、puppo、pupper)
最后的结果就是这样
name dog_stage
A floofer
B doggo
C puppo
D pupper
E doggo, floofer
F puppo, pupper
G None
并删除列
对于这两种解决方案,仅过滤必要的列:
df = df[['name','doggo' , 'floofer', 'puppo', 'pupper']].copy()
第一个解决方案如果不包含 None
则连接列名称,例如 Nonetype 或类似字符串 None
与 DataFrame.dot
以按列名称进行矩阵乘法:
#convert pet_name to index, if possible strings None replace and test not NaNs or not Nones
df1 = df.set_index('name').replace('None', np.nan).notna()
df1 = df1.dot(df1.columns + ',').str[:-1].reset_index(name='dog_stage')
print (df1)
name dog_stage
0 A floofer
1 B doggo
2 C puppo
3 D pupper
4 E doggo,floofer
5 F puppo,pupper
6 G
另一个想法是在 lambda 函数中加入如果不是 None
的每一行:
df1 = (df.set_index('name')
.replace('None', np.nan)
.apply(lambda x: ','.join(x.dropna()), axis=1)
.reset_index(name='dog_stage'))
假设我有一个具有这种结构的数据集
pet_name doggo floofer puppo pupper
A None floofer None None
B doggo None None None
C None None puppo None
D None None None pupper
E doggo floofer None None
F None None puppo pupper
G None None None None
并且我想要一个名为 dog_stage 的新列,其中包含变量(doggo、floofer、puppo、pupper)
最后的结果就是这样
name dog_stage
A floofer
B doggo
C puppo
D pupper
E doggo, floofer
F puppo, pupper
G None
并删除列
对于这两种解决方案,仅过滤必要的列:
df = df[['name','doggo' , 'floofer', 'puppo', 'pupper']].copy()
第一个解决方案如果不包含 None
则连接列名称,例如 Nonetype 或类似字符串 None
与 DataFrame.dot
以按列名称进行矩阵乘法:
#convert pet_name to index, if possible strings None replace and test not NaNs or not Nones
df1 = df.set_index('name').replace('None', np.nan).notna()
df1 = df1.dot(df1.columns + ',').str[:-1].reset_index(name='dog_stage')
print (df1)
name dog_stage
0 A floofer
1 B doggo
2 C puppo
3 D pupper
4 E doggo,floofer
5 F puppo,pupper
6 G
另一个想法是在 lambda 函数中加入如果不是 None
的每一行:
df1 = (df.set_index('name')
.replace('None', np.nan)
.apply(lambda x: ','.join(x.dropna()), axis=1)
.reset_index(name='dog_stage'))