如何计算 Pandas Dataframe 中的词频 - Python

Question

我目前从字典创建了一个 Pandas 数据框。数据框看起来像：

      URL         TITLE
0   /xxxx.xx   Hi this is word count
1   /xxxx.xx   Hi this is Stack Overflow
2   /xxxx.xx   Stack Overflow Questions

我想在此 table 中添加一个新列，其中列出了单词 "Stack Overflow" 出现的频率。因此，例如，它会像：

      URL         TITLE                          COUNT
0   /xxxx.xx   Hi this is word count               0
1   /xxxx.xx   Hi this is Stack Overflow           1
2   /xxxx.xx   Stack Overflow Questions            1

count函数好像对字典不起作用，只能对字符串起作用。有没有简单的方法可以做到这一点？

Answer 1

假设这实际上是一个 pandas dataframe，你可以这样做：

import pandas as pd

table = {   'URL': ['/xxxx.xx', '/xxxx.xx', '/xxxx.xx'], 
            'TITLE': ['Hi this is word count', 'Hi this is Stack Overflow', 'Stack Overflow Questions']}

df = pd.DataFrame(table)
df['COUNT'] = df.TITLE.str.count('Stack Overflow')
print(df)

这产生：

                       TITLE       URL  COUNT
0      Hi this is word count  /xxxx.xx      0
1  Hi this is Stack Overflow  /xxxx.xx      1
2   Stack Overflow Questions  /xxxx.xx      1

Answer 2

数据帧上的count()方法擅长计算单个值的出现次数，例如"Stack Overflow"。

要对多个值进行频率分析，请考虑使用 collection.Counter(data) and its .most_common(k) 方法。

如何计算 Pandas Dataframe 中的词频 - Python

How to count word frequency from a Pandas Dataframe- Python

python

dictionary

text-mining

dataframe

pandas