计算 pandas 列中每行有多少个句子
Count how many sentences there are in each row within a pandas column
我正在尝试确定每行中有多少个句子。
Sent
I went out for a walk.
I don't know. I think you're right!
so boring!!!
WTF?
Nothing
我创建了一个我感兴趣的标点符号列表,用于确定每行的句子数量:
Output
1
2
1
1
1
为了得到这个结果,我首先考虑是否遇到符号(例如.
或!
或?
)拆分每一行。但是我不知道怎么算。
我的密码是
import re
def sentence(sent):
return re.findall('[\w][\.!\?]', sent)
df['Sent'] = df['Sent'].apply(sentence)
你能给我一些建议吗?
一个想法,如果不需要像 1
这样的最后一个值,请将 Series.str.count
与正则表达式一起用于带有转义 .!?
:
的匹配字母
df['Output'] = df['Sent'].str.count('[\w][\.!\?]')
print (df)
Sent Output
0 I went out for a walk. 1
1 I don't know. I think you're right! 2
2 so boring!!! 1
3 WTF? 1
4 Nothing 0
如果需要将 0
替换为 1
:
df['Output'] = df['Sent'].str.count('[\w][\.!\?]').clip(lower=1)
print (df)
Sent Output
0 I went out for a walk. 1
1 I don't know. I think you're right! 2
2 so boring!!! 1
3 WTF? 1
4 Nothing 1
另一个想法是使用 textstat
lib:
import textstat
df['Output'] = df['Sent'].apply(textstat.sentence_count)
print (df)
Sent Output
0 I went out for a walk. 1
1 I don't know. I think you're right! 2
2 so boring!!! 1
3 WTF? 1
4 Nothing 1
我正在尝试确定每行中有多少个句子。
Sent
I went out for a walk.
I don't know. I think you're right!
so boring!!!
WTF?
Nothing
我创建了一个我感兴趣的标点符号列表,用于确定每行的句子数量:
Output
1
2
1
1
1
为了得到这个结果,我首先考虑是否遇到符号(例如.
或!
或?
)拆分每一行。但是我不知道怎么算。
我的密码是
import re
def sentence(sent):
return re.findall('[\w][\.!\?]', sent)
df['Sent'] = df['Sent'].apply(sentence)
你能给我一些建议吗?
一个想法,如果不需要像 1
这样的最后一个值,请将 Series.str.count
与正则表达式一起用于带有转义 .!?
:
df['Output'] = df['Sent'].str.count('[\w][\.!\?]')
print (df)
Sent Output
0 I went out for a walk. 1
1 I don't know. I think you're right! 2
2 so boring!!! 1
3 WTF? 1
4 Nothing 0
如果需要将 0
替换为 1
:
df['Output'] = df['Sent'].str.count('[\w][\.!\?]').clip(lower=1)
print (df)
Sent Output
0 I went out for a walk. 1
1 I don't know. I think you're right! 2
2 so boring!!! 1
3 WTF? 1
4 Nothing 1
另一个想法是使用 textstat
lib:
import textstat
df['Output'] = df['Sent'].apply(textstat.sentence_count)
print (df)
Sent Output
0 I went out for a walk. 1
1 I don't know. I think you're right! 2
2 so boring!!! 1
3 WTF? 1
4 Nothing 1