对于 R 中的文本挖掘，如何将 DocumentTermMatrix 与原始数据框结合起来？

Question

我想要做的是创建允许我对推文进行分类的代码。因此，在下面的示例中，我想获取有关信用卡的推文，并确定它们是否与旅行问题有关。

这是初始数据集：

id<- c(123,124,125,126,127) 
text<- c("Since I love to travel, this is what I rely on every time.", 
        "I got this card for the no international transaction fee", 
        "I got this card mainly for the flight perks",
        "Very good card, easy application process",
        "The customer service is outstanding!") 
travel_cat<- c(1,0,1,0,0) 
df_all<- data.frame(id,text,travel)

输出 1：

id  text                                                        travel_cat
123 Since I love to travel, this is what I rely on every time.  1
124 I got this card for the no international transaction fee    0
125 I got this card mainly for the flight perks                 1
126 Very good card, easy application process                    0
127 The customer service is outstanding!                        0

然后我创建一个只有文本字段的数据框，然后进行文本分析：

myvars<- c("text")
df<- df_all[myvars]

library(tm)
corpus<- Corpus(DataframeSource(df))
corpus<- tm_map(corpus, content_transformer(tolower))
corpus<- tm_map(corpus, removePunctuation)
corpus<- tm_map(corpus, removeWords, stopwords("english"))
corpus<- tm_map(corpus, stripWhitespace)
dtm<- as.matrix(DocumentTermMatrix(corpus))

输出 2 (dtm):

Docs    application card    customer    easy    every ... etc.
1       0           0       0           1       0
2       0           1       0           0       1
3       0           1       0           0       0
4       1           1       0           0       0
5       0           0       1           0       0

然后我如何将其与原始数据联系起来，以便它包含来自原始数据集和矩阵的字段（输出 1 + 输出 2）： id,text,travel_cat + application,card,customer,easy,every...

Answer 1

试试cbind()

allcombined <- cbind(dtm,df_all)

这是您要找的吗？

对于 R 中的文本挖掘，如何将 DocumentTermMatrix 与原始数据框结合起来？

For text mining in R, how do I combine DocumentTermMatrix with original Data Frame?

nlp

r

text-mining