R中同一句话内的文本挖掘
text mining within same sentence in R
我有一个文本文件
"I am writing today. Today I am thinking of writing. Today is great day"
我试图在 "writing today" 是 mentioned.It 的句子中找到实例的数量,可能会发生 "writing today" 不在一起但仍然是同一句子的一部分(例如:第二句), 也需要捕获它。
因此在上面的示例中,我的计数将为 2。
知道如何在 R 中做到这一点吗?
TIA
有很多方法可以做到这一点,但是使用 tidytext,
library(tidyverse)
library(tidytext)
data_frame(text = "I am writing today. Today I am thinking of writing. Today is great day") %>%
unnest_tokens(sentence, text, 'sentences', to_lower = FALSE) %>%
mutate(sentence_number = row_number()) %>%
unnest_tokens(word, sentence, 'words', drop = FALSE) %>%
group_by(sentence_number) %>%
filter('today' %in% word, 'writing' %in% word) %>%
select(-word) %>% distinct() %>% ungroup() %>%
mutate(count = n())
#> # A tibble: 2 × 3
#> sentence sentence_number count
#> <chr> <int> <int>
#> 1 I am writing today. 1 2
#> 2 Today I am thinking of writing. 2 2
我有一个文本文件
"I am writing today. Today I am thinking of writing. Today is great day"
我试图在 "writing today" 是 mentioned.It 的句子中找到实例的数量,可能会发生 "writing today" 不在一起但仍然是同一句子的一部分(例如:第二句), 也需要捕获它。
因此在上面的示例中,我的计数将为 2。
知道如何在 R 中做到这一点吗? TIA
有很多方法可以做到这一点,但是使用 tidytext,
library(tidyverse)
library(tidytext)
data_frame(text = "I am writing today. Today I am thinking of writing. Today is great day") %>%
unnest_tokens(sentence, text, 'sentences', to_lower = FALSE) %>%
mutate(sentence_number = row_number()) %>%
unnest_tokens(word, sentence, 'words', drop = FALSE) %>%
group_by(sentence_number) %>%
filter('today' %in% word, 'writing' %in% word) %>%
select(-word) %>% distinct() %>% ungroup() %>%
mutate(count = n())
#> # A tibble: 2 × 3
#> sentence sentence_number count
#> <chr> <int> <int>
#> 1 I am writing today. 1 2
#> 2 Today I am thinking of writing. 2 2