计算 pandas 列中每行有多少个句子

Count how many sentences there are in each row within a pandas column

我正在尝试确定每行中有多少个句子。

Sent

I went out for a walk.
I don't know. I think you're right!
so boring!!!
WTF?
Nothing

我创建了一个我感兴趣的标点符号列表,用于确定每行的句子数量:

Output 
1
2
1
1
1

为了得到这个结果,我首先考虑是否遇到符号(例如.!?)拆分每一行。但是我不知道怎么算。

我的密码是

import re

def sentence(sent):
    return re.findall('[\w][\.!\?]', sent)

df['Sent'] = df['Sent'].apply(sentence)

你能给我一些建议吗?

一个想法,如果不需要像 1 这样的最后一个值,请将 Series.str.count 与正则表达式一起用于带有转义 .!?:

的匹配字母
df['Output'] = df['Sent'].str.count('[\w][\.!\?]')
print (df)
                                  Sent  Output
0               I went out for a walk.       1
1  I don't know. I think you're right!       2
2                         so boring!!!       1
3                                 WTF?       1
4                              Nothing       0

如果需要将 0 替换为 1:

df['Output'] = df['Sent'].str.count('[\w][\.!\?]').clip(lower=1)
print (df)
                                  Sent  Output
0               I went out for a walk.       1
1  I don't know. I think you're right!       2
2                         so boring!!!       1
3                                 WTF?       1
4                              Nothing       1

另一个想法是使用 textstat lib:

import textstat

df['Output'] = df['Sent'].apply(textstat.sentence_count)
print (df)
                                  Sent  Output
0               I went out for a walk.       1
1  I don't know. I think you're right!       2
2                         so boring!!!       1
3                                 WTF?       1
4                              Nothing       1