如何提取R中单词子集的词频？

Question

我有一个数据框，其中一列包含大约 10,000 个单词，另一列包含相应的词频。我还有一个包含大约 600 个单词的向量。 600 个词中的每一个都是数据框中的一个词。如何从 10,000 字数据框中查找 600 字向量的频率？

Answer 1

众多解决方案之一，df$words 是您 data.frame 的列，其中包含单词，wordsvector 是向量：

library(plyr)
freqwords <- ddply(df, .(words), summarize, n = length(words)) #shows frequency of all the words in the data.frame
freqwords[freqwords$words %in% wordsvector,] #keeping only the words that appear in your vector

如果您提供一些虚拟数据，下次我们会更好地帮助您。

Answer 2

使用 dplyr 的连接函数。

# make the 600 vector into a dataframe
600_df <- as.data.frame(600_vec)

# left join the two dataframes
df <- left_join(x = 600_df, y = 10000_df, by = "word")

其中"word"是两个数据帧之间的变量名常量

如何提取R中单词子集的词频？

how to extract word frequency for a subset of words in R?

r

text-mining

dataframe

word-frequency