R - 寻找语料库向量的最大值

Question

我是 R 编码的新手。我一直在尝试使用 TM 库来获取每个元素中的情绪百分比。

我开始使用：

   sc <- Corpus(VectorSource(email))

之后尝试使用以下方法尽量减少不必要的单词：

sclean<- tm_map(sc, removePunctuation)
sclean <- tm_map(sclean, content_transformer(tolower))
sclean <- tm_map(sclean, removeWords, stopwords(kind="en"))
sclean <- tm_map(sclean, removeNumbers)
sclean <- tm_map(sclean, stripWhitespace)
sclean <- tm_map(sclean, removeWords, commonwords)
sent<-sent_Analysed<-get_nrc_sentiment(unlist(as.list(sclean)))

我得到的答案看起来像（每一行都是 "sent" ）：

由此我想找到最大值并计算它的百分比（不包括负数和正数）。例如第 2 行：

最大值为：信任 (40) 百分比将为：32.5 ( max / sum (= 123) * 100)

我正在努力寻找最大值，以及除最后 2 列之外的所有数字的总和（每行由 for 循环打印）

Answer 1

使用比你的小的例子...

sent <- data.frame(a1=c(1,2),a2=c(2,3),a3=c(4,1))
sent
  a1 a2 a3
1  1  2  4
2  2  3  1

您可以使用 apply 在 base R 中执行此操作，如下所示...

sentsum <- data.frame(best=names(sent)[apply(sent,1,which.max)], #name of highest column
                      score=apply(sent,1,max), #value of highest column
                      stringsAsFactors = FALSE)
sentsum$percent <- 100*sentsum$score/rowSums(sent) #percent of row sum

sentsum
  best score  percent
1   a3     4 57.14286
2   a2     3 50.00000

R - 寻找语料库向量的最大值

R - finding max value of Corpus vector

nlp

r

corpus

text-mining