如何将数据框行合并为单行,并为每一列集中所有行值?
How to merge dataframe rows to a single row with all row values concenated for each column?
我有一个像这样的 df:
| col1 | col2 | col3
0 | Text1 | a,b ,c | klra-tk³,t54 ?
1 | Text2 | NaN | gimbal3, gimbal4
2 | Text3 | a,k,m | NaN
我想得到一行,其中一列的所有唯一值都在一行中,而 NaN 被忽略,例如:
| col1 | col2 | col3
0 | Text1, Text2, Text3 | a,b,c,k,m | klra-tk³,t54,gimbal3, gimbal4
如何使用 pandas 执行此操作?
将自定义函数与 Series.str.split
, DataFrame.stack
, reove duplicates by Series.drop_duplicates
and remove missing values by Series.dropna
, last join by ,
and convert Series
to one row DataFrame by Series.to_frame
一起使用并转置:
f = lambda x: ','.join(x.str.split(',', expand=True).stack().drop_duplicates().dropna())
df = df.apply(f).to_frame().T
print (df)
col1 col2 col3
0 Text1,Text2,Text3 a,b,c,k,m klra-tk,t54,gimbal3,gimbal4
或使用列表推导式:
f = lambda x: ','.join(x.str.split(',', expand=True).stack().drop_duplicates().dropna())
df = pd.DataFrame([[f(df[x]) for x in df.columns]], columns=df.columns)
我有一个像这样的 df:
| col1 | col2 | col3
0 | Text1 | a,b ,c | klra-tk³,t54 ?
1 | Text2 | NaN | gimbal3, gimbal4
2 | Text3 | a,k,m | NaN
我想得到一行,其中一列的所有唯一值都在一行中,而 NaN 被忽略,例如:
| col1 | col2 | col3
0 | Text1, Text2, Text3 | a,b,c,k,m | klra-tk³,t54,gimbal3, gimbal4
如何使用 pandas 执行此操作?
将自定义函数与 Series.str.split
, DataFrame.stack
, reove duplicates by Series.drop_duplicates
and remove missing values by Series.dropna
, last join by ,
and convert Series
to one row DataFrame by Series.to_frame
一起使用并转置:
f = lambda x: ','.join(x.str.split(',', expand=True).stack().drop_duplicates().dropna())
df = df.apply(f).to_frame().T
print (df)
col1 col2 col3
0 Text1,Text2,Text3 a,b,c,k,m klra-tk,t54,gimbal3,gimbal4
或使用列表推导式:
f = lambda x: ','.join(x.str.split(',', expand=True).stack().drop_duplicates().dropna())
df = pd.DataFrame([[f(df[x]) for x in df.columns]], columns=df.columns)