如何在 Python 中使用 groupby 合并文本,同时保持其他行固定?
How to use groupby in Python to merge text while keeping the other rows fixed?
我有以下数据框:
import pandas as pd
df = pd.DataFrame({'Date':['2022-01-01', '2022-01-01','2022-01-01','2022-02-01','2022-02-01',
'2022-03-01','2022-03-01','2022-03-01'],
'Type': ['R','R','R','P','P','G','G','G'],
'Class':[1,1,1,0,0,2,2,2],
'Text':['Hello-','I would like.','to be merged.','with all other.',
'sentences that.','belong to my same.','group.','thanks a lot.']})
df.index =[1,1,1,2,2,3,3,3]
我想做的是按索引分组以加入文本的列,同时仅保留其他列的第一行。
我尝试了以下两种解决方案均未成功。可能我应该把它们结合起来,但我不知道该怎么做。
# Approach 1
df.groupby([df.index],as_index=False).agg(lambda x : x.sum() if x.dtype=='float64' else ' '.join(x))
# Approach 2
df.groupby([df.index], as_index=False).agg({'Date': 'first',
'Type': 'first', 'Class': 'first', 'Test': 'join'})
结果应该是:
Date Type Class Text
2022-01-01 R 1 Hello. I would like to be merged.
2022-02-01 P 0 with all other sentences that.
2022-03-01 G 2 belong to my same. group. thanks a lot.
谁能帮我做一下?
谢谢!
我的想法是采用第二种方法并将文本聚合到一个列表中,然后像这样简单地连接各个字符串:
new_df = df.groupby([df.index], as_index=False).agg({'Date': 'first',
'Type': 'first', 'Class': 'first', 'Text': list})
new_df['Text'] = new_df['Text'].str.join('')
print(new_df)
输出:
Date Type Class Text
0 2022-01-01 R 1 Hello-I would like.to be merged.
1 2022-02-01 P 0 with all other.sentences that.
2 2022-03-01 G 2 belong to my same.group.thanks a lot.
发现您也可以在一条语句中完成(相同的方法):
new_df = df.groupby([df.index], as_index=False).agg({'Date': 'first',
'Type': 'first', 'Class': 'first', 'Text': ''.join})
我有以下数据框:
import pandas as pd
df = pd.DataFrame({'Date':['2022-01-01', '2022-01-01','2022-01-01','2022-02-01','2022-02-01',
'2022-03-01','2022-03-01','2022-03-01'],
'Type': ['R','R','R','P','P','G','G','G'],
'Class':[1,1,1,0,0,2,2,2],
'Text':['Hello-','I would like.','to be merged.','with all other.',
'sentences that.','belong to my same.','group.','thanks a lot.']})
df.index =[1,1,1,2,2,3,3,3]
我想做的是按索引分组以加入文本的列,同时仅保留其他列的第一行。
我尝试了以下两种解决方案均未成功。可能我应该把它们结合起来,但我不知道该怎么做。
# Approach 1
df.groupby([df.index],as_index=False).agg(lambda x : x.sum() if x.dtype=='float64' else ' '.join(x))
# Approach 2
df.groupby([df.index], as_index=False).agg({'Date': 'first',
'Type': 'first', 'Class': 'first', 'Test': 'join'})
结果应该是:
Date Type Class Text
2022-01-01 R 1 Hello. I would like to be merged.
2022-02-01 P 0 with all other sentences that.
2022-03-01 G 2 belong to my same. group. thanks a lot.
谁能帮我做一下?
谢谢!
我的想法是采用第二种方法并将文本聚合到一个列表中,然后像这样简单地连接各个字符串:
new_df = df.groupby([df.index], as_index=False).agg({'Date': 'first',
'Type': 'first', 'Class': 'first', 'Text': list})
new_df['Text'] = new_df['Text'].str.join('')
print(new_df)
输出:
Date Type Class Text
0 2022-01-01 R 1 Hello-I would like.to be merged.
1 2022-02-01 P 0 with all other.sentences that.
2 2022-03-01 G 2 belong to my same.group.thanks a lot.
发现您也可以在一条语句中完成(相同的方法):
new_df = df.groupby([df.index], as_index=False).agg({'Date': 'first',
'Type': 'first', 'Class': 'first', 'Text': ''.join})