Python 数据操作:从一列拆分数据以在同一数据框中生成更多行
Python data Manipulation: Splitting Data from one column to make more rows in the same dataframe
所以我的输入是:
Col1 Col2 Col3 Col4
0 123 abc,def,ghi Country1 XXX
1 456 pqr,stu Country2 XX
2 789 xyz Country2 YY
我希望输出为:
Col1 Col2 Col3 Col4
0 abc 123 Country1 XXX
1 def 123 Country1 XXX
2 ghi 123 Country1 XXX
3 pqr 456 Country2 XX
4 stu 456 Country2 XX
5 xyz 789 Country2 YY
请问执行此操作最符合 pythonic 的方法是什么?谢谢 vm!
你可以用str.split
with stack
for creating Series
for join
到原来的DataFrame
:
print (df.Col2
.str
.split(',',expand=True)
.stack()
.reset_index(drop=True, level=1)
.rename('Col2'))
0 abc
0 def
0 ghi
1 pqr
1 stu
2 xyz
Name: Col2, dtype: object
print (df.drop('Col2', axis=1)
.join
(
df.Col2
.str
.split(',',expand=True)
.stack()
.reset_index(drop=True, level=1)
.rename('Col2')
))
Col1 Col3 Col4 Col2
0 123 Country1 XXX abc
0 123 Country1 XXX def
0 123 Country1 XXX ghi
1 456 Country2 XX pqr
1 456 Country2 XX stu
2 789 Country2 YY xyz
使用 extractall
和 join
d1 = df.Col2.str.extractall('([^,]+)') \
.rename(columns={0: 'Col2'}) \
.reset_index(1, drop=True)
df.drop('Col2', 1).join(d1).reindex_axis(df.columns, 1)
所以我的输入是:
Col1 Col2 Col3 Col4
0 123 abc,def,ghi Country1 XXX
1 456 pqr,stu Country2 XX
2 789 xyz Country2 YY
我希望输出为:
Col1 Col2 Col3 Col4
0 abc 123 Country1 XXX
1 def 123 Country1 XXX
2 ghi 123 Country1 XXX
3 pqr 456 Country2 XX
4 stu 456 Country2 XX
5 xyz 789 Country2 YY
请问执行此操作最符合 pythonic 的方法是什么?谢谢 vm!
你可以用str.split
with stack
for creating Series
for join
到原来的DataFrame
:
print (df.Col2
.str
.split(',',expand=True)
.stack()
.reset_index(drop=True, level=1)
.rename('Col2'))
0 abc
0 def
0 ghi
1 pqr
1 stu
2 xyz
Name: Col2, dtype: object
print (df.drop('Col2', axis=1)
.join
(
df.Col2
.str
.split(',',expand=True)
.stack()
.reset_index(drop=True, level=1)
.rename('Col2')
))
Col1 Col3 Col4 Col2
0 123 Country1 XXX abc
0 123 Country1 XXX def
0 123 Country1 XXX ghi
1 456 Country2 XX pqr
1 456 Country2 XX stu
2 789 Country2 YY xyz
使用 extractall
和 join
d1 = df.Col2.str.extractall('([^,]+)') \
.rename(columns={0: 'Col2'}) \
.reset_index(1, drop=True)
df.drop('Col2', 1).join(d1).reindex_axis(df.columns, 1)