R 中的颜色突出显示文本,用于预定义的单词列表
Color highlighting text in R for a pre-defined list of words
假设我有一组文档,例如:
text = c("is it possible to highlight text for some words" ,
"suppose i want words like words to be red and words like text to be blue")
我想知道是否可以使用 R 为预定义的单词列表用颜色突出显示文档(特别是对于大型语料库)。列表中的每个单词都会获得特定的颜色。例如,突出显示 "words" 为红色,"text" 为蓝色,如下所示。
对于这个问题,这是一个有点老套的解决方案,对于大型语料库来说可扩展性不是很好。我很想知道是否有更简洁、优雅和可扩展的方法来做到这一点。
library(tidyverse)
library(crayon)
# define text
text <- c("is it possible to highlight text for some words" ,
"suppose i want words like words to be red and words like text to be blue")
# individuate words
unique_words <- function(x) {
purrr::map(.x = x,
.f = ~ unique(base::strsplit(x = ., split = " ")[[1]],
collapse = " "))
}
# creating a dataframe with crayonized text
df <-
tibble::enframe(unique_words(x = text)) %>%
tidyr::unnest() %>%
# here you can specify the color/word combinations you need
dplyr::mutate(.data = .,
value2 = dplyr::case_when(value == "text" ~ crayon::blue(value),
value == "words" ~ crayon::red(value),
TRUE ~ value)) %>%
dplyr::select(., -value)
# printing the text
print(cat(df$value2))
P.S。不幸的是,reprex
不适用于彩色文本,因此无法生成完整的 reprex。
Indrajeet 的回答很棒。这是基于 Indrajeet 的答案的答案,只是稍作改动。
unique_words <- lapply(strsplit(text, " "), function(x){x[!x ==""]})
# creating a dataframe with crayonized text
df <-
tibble::enframe(unique_words) %>%
tidyr::unnest() %>%
# here you can specify the color/word combinations you need
dplyr::mutate(.data = .,
value2 = dplyr::case_when(value == "text" ~ crayon::blue(value),
value == "words" ~ crayon::red(value),
TRUE ~ value)) %>%
dplyr::select(., -value)
在两个不同的行中输出 (Collapse text by group in data frame):
df <- data.table(df)
df <- df[, list(text = paste(value2, collapse=" ")), by = name]
如果我想在 R 控制台中打印答案,答案看起来不错。如果我想在 R shinyapp 中输出它是如何工作的?
正在寻找其他替代品并感谢您的帮助。
这是完整的调试应用代码!
首先,需要的库:
library(shiny)
library(tidyverse)
library(DT)
library(magrittr)
然后,添加HTML标签的函数:
wordHighlight <- function(SuspWord,colH = 'yellow') {
paste0('<span style="background-color:',colH,'">',SuspWord,'</span>')
}
现在UI部分:
ui <- fluidPage(
titlePanel("Text Highlighting"),
sidebarLayout(
sidebarPanel(
textInput("wordSearch", "Word Search")
),
mainPanel(
DT::dataTableOutput("table")
)
)
)
最后,在服务器端:
server <- function(input, output) {
sentence <- "The term 'data science' (originally used interchangeably with 'datalogy') has existed for over thirty years and was used initially as a substitute for computer science by Peter Naur in 1960."
sentence2 = "One of the things we will want to do most often for social science analyses of text data is generate a document-term matrix."
YourData = data.frame(N = c('001','002'), T = c(sentence,sentence2), stringsAsFactors=FALSE)
highlightData <- reactive({
if (input$wordSearch!="")
{
patterns = input$wordSearch
YourData2 = YourData
YourData2[,2] %<>% str_replace_all(regex(patterns, ignore_case = TRUE), wordHighlight)
return(YourData2)
}
return(YourData)
})
output$table <- DT::renderDataTable({
data <- highlightData()
}, escape = FALSE)
}
运行 应用:
shinyApp(ui = ui, server = server)
假设我有一组文档,例如:
text = c("is it possible to highlight text for some words" ,
"suppose i want words like words to be red and words like text to be blue")
我想知道是否可以使用 R 为预定义的单词列表用颜色突出显示文档(特别是对于大型语料库)。列表中的每个单词都会获得特定的颜色。例如,突出显示 "words" 为红色,"text" 为蓝色,如下所示。
对于这个问题,这是一个有点老套的解决方案,对于大型语料库来说可扩展性不是很好。我很想知道是否有更简洁、优雅和可扩展的方法来做到这一点。
library(tidyverse)
library(crayon)
# define text
text <- c("is it possible to highlight text for some words" ,
"suppose i want words like words to be red and words like text to be blue")
# individuate words
unique_words <- function(x) {
purrr::map(.x = x,
.f = ~ unique(base::strsplit(x = ., split = " ")[[1]],
collapse = " "))
}
# creating a dataframe with crayonized text
df <-
tibble::enframe(unique_words(x = text)) %>%
tidyr::unnest() %>%
# here you can specify the color/word combinations you need
dplyr::mutate(.data = .,
value2 = dplyr::case_when(value == "text" ~ crayon::blue(value),
value == "words" ~ crayon::red(value),
TRUE ~ value)) %>%
dplyr::select(., -value)
# printing the text
print(cat(df$value2))
P.S。不幸的是,reprex
不适用于彩色文本,因此无法生成完整的 reprex。
Indrajeet 的回答很棒。这是基于 Indrajeet 的答案的答案,只是稍作改动。
unique_words <- lapply(strsplit(text, " "), function(x){x[!x ==""]})
# creating a dataframe with crayonized text
df <-
tibble::enframe(unique_words) %>%
tidyr::unnest() %>%
# here you can specify the color/word combinations you need
dplyr::mutate(.data = .,
value2 = dplyr::case_when(value == "text" ~ crayon::blue(value),
value == "words" ~ crayon::red(value),
TRUE ~ value)) %>%
dplyr::select(., -value)
在两个不同的行中输出 (Collapse text by group in data frame):
df <- data.table(df)
df <- df[, list(text = paste(value2, collapse=" ")), by = name]
如果我想在 R 控制台中打印答案,答案看起来不错。如果我想在 R shinyapp 中输出它是如何工作的?
正在寻找其他替代品并感谢您的帮助。
这是完整的调试应用代码!
首先,需要的库:
library(shiny)
library(tidyverse)
library(DT)
library(magrittr)
然后,添加HTML标签的函数:
wordHighlight <- function(SuspWord,colH = 'yellow') {
paste0('<span style="background-color:',colH,'">',SuspWord,'</span>')
}
现在UI部分:
ui <- fluidPage(
titlePanel("Text Highlighting"),
sidebarLayout(
sidebarPanel(
textInput("wordSearch", "Word Search")
),
mainPanel(
DT::dataTableOutput("table")
)
)
)
最后,在服务器端:
server <- function(input, output) {
sentence <- "The term 'data science' (originally used interchangeably with 'datalogy') has existed for over thirty years and was used initially as a substitute for computer science by Peter Naur in 1960."
sentence2 = "One of the things we will want to do most often for social science analyses of text data is generate a document-term matrix."
YourData = data.frame(N = c('001','002'), T = c(sentence,sentence2), stringsAsFactors=FALSE)
highlightData <- reactive({
if (input$wordSearch!="")
{
patterns = input$wordSearch
YourData2 = YourData
YourData2[,2] %<>% str_replace_all(regex(patterns, ignore_case = TRUE), wordHighlight)
return(YourData2)
}
return(YourData)
})
output$table <- DT::renderDataTable({
data <- highlightData()
}, escape = FALSE)
}
运行 应用:
shinyApp(ui = ui, server = server)