计算 pandas 中整个列中字符串的出现次数
Count appearances of a string throughout columns in pandas
考虑以下数据框:
import pandas as pd
df = pd.DataFrame(["What is the answer",
"the answer isn't here, but the answer is 42" ,
"dogs are nice",
"How are you"], columns=['words'])
df
words
0 What is the answer
1 the answer isn't here, but the answer is 42
2 dogs are nice
3 How are you
我想统计某个字符串出现的次数,可能在每个索引中重复出现几次。
比如我想统计the answer
出现的次数。
我试过了:
df.words.str.contains(r'the answer').count()
我希望得到一个解决方案,但输出是 4
。
我不明白为什么。 the answer
出现3次。
What is **the answer**
**the answer** isn't here, but **the answer** is 42
注意:搜索字符串可能在行中出现多次
你需要str.count
In [5285]: df.words.str.count("the answer").sum()
Out[5285]: 3
In [5286]: df.words.str.count("the answer")
Out[5286]:
0 1
1 2
2 0
3 0
Name: words, dtype: int64
考虑以下数据框:
import pandas as pd
df = pd.DataFrame(["What is the answer",
"the answer isn't here, but the answer is 42" ,
"dogs are nice",
"How are you"], columns=['words'])
df
words
0 What is the answer
1 the answer isn't here, but the answer is 42
2 dogs are nice
3 How are you
我想统计某个字符串出现的次数,可能在每个索引中重复出现几次。
比如我想统计the answer
出现的次数。
我试过了:
df.words.str.contains(r'the answer').count()
我希望得到一个解决方案,但输出是 4
。
我不明白为什么。 the answer
出现3次。
What is **the answer**
**the answer** isn't here, but **the answer** is 42
注意:搜索字符串可能在行中出现多次
你需要str.count
In [5285]: df.words.str.count("the answer").sum()
Out[5285]: 3
In [5286]: df.words.str.count("the answer")
Out[5286]:
0 1
1 2
2 0
3 0
Name: words, dtype: int64