在R中提取客户评论的NLP词性标签
Extracting NLP part-of-speech labels of customers' review in R
我有以下数据框,其中包含顾客在餐厅网站上留下的评论:
id<-c(1,2,3,4,5,6)
review<- c("the food was very delicious and hearty - perfect to warm up during a freezing winters day", "Excellent service as usual","Love this place!", "Service and quality of food first class"," Customer services was exceptional by all staff","excellent services")
df<-data.frame(id, review)
现在我正在寻找一种方法(最好不使用 for loop
)在 R
中每个客户的评论中找到 part-of-speech labels。
考虑到您的示例中的 id 列只是行索引,我相信您可以使用 qdap
包中的 pos()
函数获得所需的输出。
library(qdap)
pos(df$review)
如果您确实需要分组,因为每个客户有多个评论,您可以使用
pos_by(df$review,df$id)
这是对 Maxent_POS_Tag_Annotator 帮助页面上示例的非常简单的改编。
df<-data.frame(id, review, stringsAsFactors=FALSE)
library(NLP)
library(openNLP)
review.pos <-
sapply(df$review, function(ii) {
a2 <- Annotation(1L, "sentence", 1L, nchar(ii))
a2 <- annotate(ii, Maxent_Word_Token_Annotator(), a2)
a3 <- annotate(ii, Maxent_POS_Tag_Annotator(), a2)
a3w <- subset(a3, type == "word")
tags <- sapply(a3w$features, `[[`, "POS")
sprintf("%s/%s", as.String(ii)[a3w], tags)
})
导致此输出:
#[[1]]
# [1] "the/DT" "food/NN" "was/VBD" "very/RB" "delicious/JJ"
# [6] "and/CC" "hearty/NN" "-/:" "perfect/JJ" "to/TO"
#[11] "warm/VB" "up/RP" "during/IN" "a/DT" "freezing/JJ"
#[16] "winters/NNS" "day/NN"
#
#[[2]]
#[1] "Excellent/JJ" "service/NN" "as/IN" "usual/JJ"
#
#[[3]]
#[1] "Love/VB" "this/DT" "place/NN" "!/."
#
#[[4]]
#[1] "Service/NNP" "and/CC" "quality/NN" "of/IN" "food/NN"
#[6] "first/JJ" "class/NN"
#
#[[5]]
#[1] "Customer/NN" "services/NNS" "was/VBD" "exceptional/JJ"
#[5] "by/IN" "all/DT" "staff/NN"
#
#[[6]]
#[1] "excellent/JJ" "services/NNS"
将其调整为您想要的任何格式应该相对简单。
如果你不介意尝试 GitHub 包,我有 tagger package 来包装 NLP/openNLP 以 Python 用户操作 pos 标签的方式快速完成一些任务。请注意,输出以传统的 word/tag
格式打印,但实际上该对象实际上是一个命名向量列表。这使得使用单词和标签更容易。在这里,我演示了如何获取标签以及 tagger 使之变得简单的一些操作:
#首先加载你的数据,并为那些在家玩的人获取标注器包
id<-c(1,2,3,4,5,6)
review<- c("the food was very delicious and hearty - perfect to warm up during a freezing winters day", "Excellent service as usual","Love this place!", "Service and quality of food first class"," Customer services was exceptional by all staff","excellent services")
df<-data.frame(id, review)
if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/tagger")
# 现在标记和操作
(out <- tag_pos(as.character(df[["review"]])))
## [1] "the/DT food/NN was/VBD very/RB delicious/JJ and/CC hearty/NN -/: perfect/JJ to/TO warm/VB up/RP during/IN a/DT freezing/JJ winters/NNS day/NN"
## [2] "Excellent/JJ service/NN as/IN usual/JJ"
## [3] "Love/VB this/DT place/NN !/."
## [4] "Service/NNP and/CC quality/NN of/IN food/NN first/JJ class/NN"
## [5] "Customer/NN services/NNS was/VBD exceptional/JJ by/IN all/DT staff/NN"
## [6] "excellent/JJ services/NNS"
c(out) ## True structure: list of named vectors
as_word_tag(out) ## Match the print method (less mutable)
count_tags(out, df[["id"]]) ## Get counts by row
plot(out) ## tag distribution (plot at end)
as_basic(out) ## basic pos tags
## [1] "the/article food/noun was/verb very/adverb delicious/adjective and/conjunction hearty/noun -/. perfect/adjective to/preposition warm/verb up/preposition during/preposition a/article freezing/adjective winters/noun day/noun"
## [2] "Excellent/adjective service/noun as/preposition usual/adjective"
## [3] "Love/verb this/adjective place/noun !/."
## [4] "Service/noun and/conjunction quality/noun of/preposition food/noun first/adjective class/noun"
## [5] "Customer/noun services/noun was/verb exceptional/adjective by/preposition all/adjective staff/noun"
## [6] "excellent/adjective services/noun"
select_tags(out, c("NN", "NNP", "NNPS", "NNS"))
## [1] "food/NN hearty/NN winters/NNS day/NN"
## [2] "service/NN"
## [3] "place/NN"
## [4] "Service/NNP quality/NN food/NN class/NN"
## [5] "Customer/NN services/NNS staff/NN"
## [6] "services/NNS"
一切都在 magrittr 管道中运行得很好,这是我的偏好。 Examples Section of the README 很好地概述了包的用法。
我有以下数据框,其中包含顾客在餐厅网站上留下的评论:
id<-c(1,2,3,4,5,6)
review<- c("the food was very delicious and hearty - perfect to warm up during a freezing winters day", "Excellent service as usual","Love this place!", "Service and quality of food first class"," Customer services was exceptional by all staff","excellent services")
df<-data.frame(id, review)
现在我正在寻找一种方法(最好不使用 for loop
)在 R
中每个客户的评论中找到 part-of-speech labels。
考虑到您的示例中的 id 列只是行索引,我相信您可以使用 qdap
包中的 pos()
函数获得所需的输出。
library(qdap)
pos(df$review)
如果您确实需要分组,因为每个客户有多个评论,您可以使用
pos_by(df$review,df$id)
这是对 Maxent_POS_Tag_Annotator 帮助页面上示例的非常简单的改编。
df<-data.frame(id, review, stringsAsFactors=FALSE)
library(NLP)
library(openNLP)
review.pos <-
sapply(df$review, function(ii) {
a2 <- Annotation(1L, "sentence", 1L, nchar(ii))
a2 <- annotate(ii, Maxent_Word_Token_Annotator(), a2)
a3 <- annotate(ii, Maxent_POS_Tag_Annotator(), a2)
a3w <- subset(a3, type == "word")
tags <- sapply(a3w$features, `[[`, "POS")
sprintf("%s/%s", as.String(ii)[a3w], tags)
})
导致此输出:
#[[1]]
# [1] "the/DT" "food/NN" "was/VBD" "very/RB" "delicious/JJ"
# [6] "and/CC" "hearty/NN" "-/:" "perfect/JJ" "to/TO"
#[11] "warm/VB" "up/RP" "during/IN" "a/DT" "freezing/JJ"
#[16] "winters/NNS" "day/NN"
#
#[[2]]
#[1] "Excellent/JJ" "service/NN" "as/IN" "usual/JJ"
#
#[[3]]
#[1] "Love/VB" "this/DT" "place/NN" "!/."
#
#[[4]]
#[1] "Service/NNP" "and/CC" "quality/NN" "of/IN" "food/NN"
#[6] "first/JJ" "class/NN"
#
#[[5]]
#[1] "Customer/NN" "services/NNS" "was/VBD" "exceptional/JJ"
#[5] "by/IN" "all/DT" "staff/NN"
#
#[[6]]
#[1] "excellent/JJ" "services/NNS"
将其调整为您想要的任何格式应该相对简单。
如果你不介意尝试 GitHub 包,我有 tagger package 来包装 NLP/openNLP 以 Python 用户操作 pos 标签的方式快速完成一些任务。请注意,输出以传统的 word/tag
格式打印,但实际上该对象实际上是一个命名向量列表。这使得使用单词和标签更容易。在这里,我演示了如何获取标签以及 tagger 使之变得简单的一些操作:
#首先加载你的数据,并为那些在家玩的人获取标注器包
id<-c(1,2,3,4,5,6)
review<- c("the food was very delicious and hearty - perfect to warm up during a freezing winters day", "Excellent service as usual","Love this place!", "Service and quality of food first class"," Customer services was exceptional by all staff","excellent services")
df<-data.frame(id, review)
if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/tagger")
# 现在标记和操作
(out <- tag_pos(as.character(df[["review"]])))
## [1] "the/DT food/NN was/VBD very/RB delicious/JJ and/CC hearty/NN -/: perfect/JJ to/TO warm/VB up/RP during/IN a/DT freezing/JJ winters/NNS day/NN"
## [2] "Excellent/JJ service/NN as/IN usual/JJ"
## [3] "Love/VB this/DT place/NN !/."
## [4] "Service/NNP and/CC quality/NN of/IN food/NN first/JJ class/NN"
## [5] "Customer/NN services/NNS was/VBD exceptional/JJ by/IN all/DT staff/NN"
## [6] "excellent/JJ services/NNS"
c(out) ## True structure: list of named vectors
as_word_tag(out) ## Match the print method (less mutable)
count_tags(out, df[["id"]]) ## Get counts by row
plot(out) ## tag distribution (plot at end)
as_basic(out) ## basic pos tags
## [1] "the/article food/noun was/verb very/adverb delicious/adjective and/conjunction hearty/noun -/. perfect/adjective to/preposition warm/verb up/preposition during/preposition a/article freezing/adjective winters/noun day/noun"
## [2] "Excellent/adjective service/noun as/preposition usual/adjective"
## [3] "Love/verb this/adjective place/noun !/."
## [4] "Service/noun and/conjunction quality/noun of/preposition food/noun first/adjective class/noun"
## [5] "Customer/noun services/noun was/verb exceptional/adjective by/preposition all/adjective staff/noun"
## [6] "excellent/adjective services/noun"
select_tags(out, c("NN", "NNP", "NNPS", "NNS"))
## [1] "food/NN hearty/NN winters/NNS day/NN"
## [2] "service/NN"
## [3] "place/NN"
## [4] "Service/NNP quality/NN food/NN class/NN"
## [5] "Customer/NN services/NNS staff/NN"
## [6] "services/NNS"
一切都在 magrittr 管道中运行得很好,这是我的偏好。 Examples Section of the README 很好地概述了包的用法。