将一段分成 R 中的句子向量

Question

我有以下段落：

Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)

为了应用 RSentiment 包中的 calculate_total_presence_sentiment 命令，我想将本段分成句子向量，如下所示：

[1] "Well, um...such a personal topic."                                       
[2] "No wonder I am the first to write a review."                             
[3] "Suffice to say this stuff does just what they claim and tastes pleasant."
[4] "And I had, well, major problems in this area and now I don't."           
[5] "'Nuff said."                                                             
[6] ":-)"

感谢您对此的帮助。

Answer 1

qdap 有一个方便的功能：

sent_detect_nlp - Detect and split sentences on endmark boundaries using openNLP & NLP utilities which matches the onld version of the openNLP package's now removed sentDetect function.

library(qdap)

txt <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)"

sent_detect_nlp(txt)
#[1] "Well, um...such a personal topic."                                       
#[2] "No wonder I am the first to write a review."                             
#[3] "Suffice to say this stuff does just what they claim and tastes pleasant."
#[4] "And I had, well, major problems in this area and now I don't."           
#[5] "'Nuff said."                                                             
#[6] ":-)"

Answer 2

肮脏的解决方案

    > data <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)"
    > ?"regular expression"
    > strsplit(data, "(?<=[^.][.][^.])", perl=TRUE)
    [[1]]
   [1] "Well, um...such a personal topic. "                                       
   [2] "No wonder I am the first to write a review. "                             
   [3] "Suffice to say this stuff does just what they claim and tastes pleasant. "
   [4] "And I had, well, major problems in this area and now I don't. "           
   [5] "'Nuff said. "                                                             
   [6] ":-)"

使用来自https://cran.r-project.org/web/views/NaturalLanguageProcessing.html

的工具

Answer 3

您可以将文本保存在 .txt 文件中。确保 .txt 文件中的每一行都包含一个要作为向量读取的语句。使用基函数 readLines('filepath/filename.txt')。生成的数据框将读取原始文本文件中的每一行作为向量。

> mylines <- readLines('text.txt')
Warning message:
In readLines("text.txt") : incomplete final line found on 'text.txt'
> mylines
[1] "Well, um...such a personal topic."                                       
[2] "No wonder I am the first to write a review."                             
[3] "Suffice to say this stuff does just what they claim and tastes
pleasant."
[4] "And I had, well, major problems in this area and now I don't."           
[5] "'Nuff said'."                                                            
[6] ":-)"

> mylines[3]
[1] "Suffice to say this stuff does just what they claim and tastes
pleasant."

将一段分成 R 中的句子向量

Breaking a paragraph into a vector of sentences in R

r

text-mining