将多个句子标记为 python pandas 中的行
Tokenize multiple sentences to rows in python pandas
我有一个这样的文本数据框,
id text
1 Thanks. I appreciate your help. I really like this chat service as it is very convenient. I hope you have a wonderful day! thanks!
2 Got it. Thanks for the help; good nite.
我想拆分那些文本句子并将它们与每个 ID 匹配。我的预期输出是,
id text
1 Thanks.
1 I appreciate your help.
1 I really like this chat service as it is very convenient.
1 I hope you have a wonderful day!
1 thanks!
2 Got it.
2 Thanks for the help;
2 good nite.
有没有nltk函数可以解决这个问题?
1st split
然后使用 explode
,如果你没有升级你的 pandas
到 0.25 ,检查
df.assign(text=df.text.str.split('[.!;]')).explode('text').loc[lambda x : x.text!='']
Out[181]:
text id
0 Thanks 1
0 I appreciate your help 1
0 I really like this chat service as it is ver... 1
0 I hope you have a wonderful day 1
0 thanks 1
1 Got it 2
1 Thanks for the help 2
1 good nite 2
我有一个这样的文本数据框,
id text
1 Thanks. I appreciate your help. I really like this chat service as it is very convenient. I hope you have a wonderful day! thanks!
2 Got it. Thanks for the help; good nite.
我想拆分那些文本句子并将它们与每个 ID 匹配。我的预期输出是,
id text
1 Thanks.
1 I appreciate your help.
1 I really like this chat service as it is very convenient.
1 I hope you have a wonderful day!
1 thanks!
2 Got it.
2 Thanks for the help;
2 good nite.
有没有nltk函数可以解决这个问题?
1st split
然后使用 explode
,如果你没有升级你的 pandas
到 0.25 ,检查
df.assign(text=df.text.str.split('[.!;]')).explode('text').loc[lambda x : x.text!='']
Out[181]:
text id
0 Thanks 1
0 I appreciate your help 1
0 I really like this chat service as it is ver... 1
0 I hope you have a wonderful day 1
0 thanks 1
1 Got it 2
1 Thanks for the help 2
1 good nite 2