基于 frequent/common 个词的直方图

Histogram based on frequent/common words

我正在尝试根据 frequent/common 个单词创建直方图,但我只在 运行 代码时出错。我设法找到了 10 个最常用的词,但我无法在直方图中将其可视化。

description_list = df['description'].values.tolist()

from collections import Counter
Counter(" ".join(description_list).split()).most_common(10)

#histogram 
plt.bar(x, y)
plt.title("10 most frequent tokens in description")
plt.ylabel("Frequency")
plt.xlabel("Words")
plt.show

看来这漏掉了几件事:

  1. Counter(...).most_common(10) 的结果未分配给 xy
  2. xy 似乎未绑定
  3. plt.show 未被调用,因此它要么不执行任何操作,要么打印类似 <function show at 0x...>
  4. 的内容

这是修复这些问题的可重现示例:

from collections import Counter
import matplotlib.pyplot as plt
import pandas as pd

data = {
    "description": [
        "This is the first example",
        "This is the second example",
        "This is similar to the first two",
        "This exists add more words"
    ]
}
df = pd.DataFrame(data)


description_list = df['description'].values.tolist()

# Assign the Counter instance `most_common` call to a variable:
word_frequency = Counter(" ".join(description_list).split()).most_common(10)

# `most_common` returns a list of (word, count) tuples
words = [word for word, _ in word_frequency]
counts = [counts for _, counts in word_frequency]

plt.bar(words, counts)
plt.title("10 most frequent tokens in description")
plt.ylabel("Frequency")
plt.xlabel("Words")
plt.show()

预期输出: