如何在 quanteda 中获得情绪分数(并保留情绪词)?
how to get a sentiment score (and keep the sentiment words) in quanteda?
考虑这个简单的例子
library(tibble)
library(quanteda)
tibble(mytext = c('this is a good movie',
'oh man this is really bad',
'quanteda is great!'))
# A tibble: 3 x 1
mytext
<chr>
1 this is a good movie
2 oh man this is really bad
3 quanteda is great!
我想进行一些基本的情绪分析,但有一点不同。这是我的字典,存储在常规 tibble
中
mydictionary <- tibble(sentiment = c('positive', 'positive','negative'),
word = c('good', 'great', 'bad'))
# A tibble: 3 x 2
sentiment word
<chr> <chr>
1 positive good
2 positive great
3 negative bad
本质上,我想计算每个句子中检测到的正面和负面单词的数量,同时还要跟踪匹配的单词。换句话说,输出应该看起来像
mytext nb.pos nb.neg pos.words
1 this is a good and great movie 2 0 good, great
2 oh man this is really bad 0 1 bad
3 quanteda is great! 1 0 great
我如何在 quanteda
中做到这一点?这可能吗?
谢谢!
敬请关注 quanteda v. 2.1,其中我们将大大扩展情绪分析的专用功能。与此同时,见下文。请注意,我做了一些调整,因为您报告的文本和输入文本存在差异,而且 pos.words
中包含所有情感词,而不仅仅是正面词。下面,我计算正面和所有情绪匹配。
# note the amended input text
mytext <- c(
"this is a good and great movie",
"oh man this is really bad",
"quanteda is great!"
)
mydictionary <- tibble::tibble(
sentiment = c("positive", "positive", "negative"),
word = c("good", "great", "bad")
)
library("quanteda", warn.conflicts = FALSE)
## Package version: 2.0.9000
## Parallel computing: 2 of 8 threads used.
## See https://quanteda.io for tutorials and examples.
# make the dictionary into a quanteda dictionary
qdict <- as.dictionary(mydictionary)
现在我们可以使用查找函数来获得最终结果 data.frame。
# get the sentiment scores
toks <- tokens(mytext)
df <- toks %>%
tokens_lookup(dictionary = qdict) %>%
dfm() %>%
convert(to = "data.frame")
names(df)[2:3] <- c("nb.neg", "nb.pos")
# get the matches for pos and all words
poswords <- tokens_keep(toks, qdict["positive"])
allwords <- tokens_keep(toks, qdict)
data.frame(
mytext = mytext,
df[, 2:3],
pos.words = sapply(poswords, paste, collapse = ", "),
all.words = sapply(allwords, paste, collapse = ", "),
row.names = NULL
)
## mytext nb.neg nb.pos pos.words all.words
## 1 this is a good and great movie 0 2 good, great good, great
## 2 oh man this is really bad 1 0 bad
## 3 quanteda is great! 0 1 great great
考虑这个简单的例子
library(tibble)
library(quanteda)
tibble(mytext = c('this is a good movie',
'oh man this is really bad',
'quanteda is great!'))
# A tibble: 3 x 1
mytext
<chr>
1 this is a good movie
2 oh man this is really bad
3 quanteda is great!
我想进行一些基本的情绪分析,但有一点不同。这是我的字典,存储在常规 tibble
mydictionary <- tibble(sentiment = c('positive', 'positive','negative'),
word = c('good', 'great', 'bad'))
# A tibble: 3 x 2
sentiment word
<chr> <chr>
1 positive good
2 positive great
3 negative bad
本质上,我想计算每个句子中检测到的正面和负面单词的数量,同时还要跟踪匹配的单词。换句话说,输出应该看起来像
mytext nb.pos nb.neg pos.words
1 this is a good and great movie 2 0 good, great
2 oh man this is really bad 0 1 bad
3 quanteda is great! 1 0 great
我如何在 quanteda
中做到这一点?这可能吗?
谢谢!
敬请关注 quanteda v. 2.1,其中我们将大大扩展情绪分析的专用功能。与此同时,见下文。请注意,我做了一些调整,因为您报告的文本和输入文本存在差异,而且 pos.words
中包含所有情感词,而不仅仅是正面词。下面,我计算正面和所有情绪匹配。
# note the amended input text
mytext <- c(
"this is a good and great movie",
"oh man this is really bad",
"quanteda is great!"
)
mydictionary <- tibble::tibble(
sentiment = c("positive", "positive", "negative"),
word = c("good", "great", "bad")
)
library("quanteda", warn.conflicts = FALSE)
## Package version: 2.0.9000
## Parallel computing: 2 of 8 threads used.
## See https://quanteda.io for tutorials and examples.
# make the dictionary into a quanteda dictionary
qdict <- as.dictionary(mydictionary)
现在我们可以使用查找函数来获得最终结果 data.frame。
# get the sentiment scores
toks <- tokens(mytext)
df <- toks %>%
tokens_lookup(dictionary = qdict) %>%
dfm() %>%
convert(to = "data.frame")
names(df)[2:3] <- c("nb.neg", "nb.pos")
# get the matches for pos and all words
poswords <- tokens_keep(toks, qdict["positive"])
allwords <- tokens_keep(toks, qdict)
data.frame(
mytext = mytext,
df[, 2:3],
pos.words = sapply(poswords, paste, collapse = ", "),
all.words = sapply(allwords, paste, collapse = ", "),
row.names = NULL
)
## mytext nb.neg nb.pos pos.words all.words
## 1 this is a good and great movie 0 2 good, great good, great
## 2 oh man this is really bad 1 0 bad
## 3 quanteda is great! 0 1 great great