在R中提取客户评论的NLP词性标签

Extracting NLP part-of-speech labels of customers' review in R

我有以下数据框,其中包含顾客在餐厅网站上留下的评论:

id<-c(1,2,3,4,5,6)
review<- c("the food was very delicious and hearty - perfect to warm up during a freezing winters day", "Excellent service as usual","Love this place!", "Service and quality of food first class"," Customer services was exceptional by all staff","excellent services")
df<-data.frame(id, review) 

现在我正在寻找一种方法(最好不使用 for loop)在 R 中每个客户的评论中找到 part-of-speech labels

考虑到您的示例中的 id 列只是行索引,我相信您可以使用 qdap 包中的 pos() 函数获得所需的输出。

library(qdap)
pos(df$review)

如果您确实需要分组,因为每个客户有多个评论,您可以使用

pos_by(df$review,df$id)

这是对 Maxent_POS_Tag_Annotator 帮助页面上示例的非常简单的改编。

df<-data.frame(id, review, stringsAsFactors=FALSE) 

library(NLP)
library(openNLP)

review.pos <- 
  sapply(df$review, function(ii) {
    a2 <- Annotation(1L, "sentence", 1L, nchar(ii))
    a2 <- annotate(ii, Maxent_Word_Token_Annotator(), a2)
    a3 <- annotate(ii, Maxent_POS_Tag_Annotator(), a2)
    a3w <- subset(a3, type == "word")
    tags <- sapply(a3w$features, `[[`, "POS")
    sprintf("%s/%s", as.String(ii)[a3w], tags)
  })

导致此输出:

#[[1]]
# [1] "the/DT"       "food/NN"      "was/VBD"      "very/RB"      "delicious/JJ"
# [6] "and/CC"       "hearty/NN"    "-/:"          "perfect/JJ"   "to/TO"       
#[11] "warm/VB"      "up/RP"        "during/IN"    "a/DT"         "freezing/JJ" 
#[16] "winters/NNS"  "day/NN"      
#
#[[2]]
#[1] "Excellent/JJ" "service/NN"   "as/IN"        "usual/JJ"    
#
#[[3]]
#[1] "Love/VB"  "this/DT"  "place/NN" "!/."     
#
#[[4]]
#[1] "Service/NNP" "and/CC"      "quality/NN"  "of/IN"       "food/NN"    
#[6] "first/JJ"    "class/NN"   
#
#[[5]]
#[1] "Customer/NN"    "services/NNS"   "was/VBD"        "exceptional/JJ"
#[5] "by/IN"          "all/DT"         "staff/NN"      
#
#[[6]]
#[1] "excellent/JJ" "services/NNS"

将其调整为您想要的任何格式应该相对简单。

如果你不介意尝试 GitHub 包,我有 tagger package 来包装 NLP/openNLP 以 Python 用户操作 pos 标签的方式快速完成一些任务。请注意,输出以传统的 word/tag 格式打印,但实际上该对象实际上是一个命名向量列表。这使得使用单词和标签更容易。在这里,我演示了如何获取标签以及 tagger 使之变得简单的一些操作:

#首先加载你的数据,并为那些在家玩的人获取标注器包

id<-c(1,2,3,4,5,6)
review<- c("the food was very delicious and hearty - perfect to warm up during a freezing winters day", "Excellent service as usual","Love this place!", "Service and quality of food first class"," Customer services was exceptional by all staff","excellent services")
df<-data.frame(id, review)  

if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/tagger")

# 现在标记和操作

(out <- tag_pos(as.character(df[["review"]])))

## [1] "the/DT food/NN was/VBD very/RB delicious/JJ and/CC hearty/NN -/: perfect/JJ to/TO warm/VB up/RP during/IN a/DT freezing/JJ winters/NNS day/NN"
## [2] "Excellent/JJ service/NN as/IN usual/JJ"                                                                                                       
## [3] "Love/VB this/DT place/NN !/."                                                                                                                 
## [4] "Service/NNP and/CC quality/NN of/IN food/NN first/JJ class/NN"                                                                                
## [5] "Customer/NN services/NNS was/VBD exceptional/JJ by/IN all/DT staff/NN"                                                                        
## [6] "excellent/JJ services/NNS"  


c(out)                         ## True structure: list of named vectors
as_word_tag(out)               ## Match the print method (less mutable)
count_tags(out, df[["id"]])    ## Get counts by row
plot(out)                      ## tag distribution (plot at end)

as_basic(out)                  ## basic pos tags

## [1] "the/article food/noun was/verb very/adverb delicious/adjective and/conjunction hearty/noun -/. perfect/adjective to/preposition warm/verb up/preposition during/preposition a/article freezing/adjective winters/noun day/noun"
## [2] "Excellent/adjective service/noun as/preposition usual/adjective"                                                                                                                                                               
## [3] "Love/verb this/adjective place/noun !/."                                                                                                                                                                                       
## [4] "Service/noun and/conjunction quality/noun of/preposition food/noun first/adjective class/noun"                                                                                                                                 
## [5] "Customer/noun services/noun was/verb exceptional/adjective by/preposition all/adjective staff/noun"                                                                                                                            
## [6] "excellent/adjective services/noun"          


select_tags(out, c("NN", "NNP", "NNPS", "NNS"))

## [1] "food/NN hearty/NN winters/NNS day/NN"   
## [2] "service/NN"                             
## [3] "place/NN"                               
## [4] "Service/NNP quality/NN food/NN class/NN"
## [5] "Customer/NN services/NNS staff/NN"      
## [6] "services/NNS"

一切都在 magrittr 管道中运行得很好,这是我的偏好。 Examples Section of the README 很好地概述了包的用法。