在 R 中,如何将文本环绕字符串中的所有单词,但特定单词(从左到右)?迭代和字符串操作
In R, how do I wrap text around all words in a string, but a specific one(going from left to right)? Iteration and string manipulation
我知道我的问题有点含糊,所以我有一个我正在尝试做的例子。
input <- c('I go to school')
#Output
'"I " * phantom("go to school")'
'phantom("I ") * "go" * phantom("to school")'
'phantom("I go ") * "to" * phantom("school")'
'phantom("I go to ") * "school"'
我已经编写了一个函数,但我在弄清楚如何使它适用于具有不同字数的字符串时遇到了很多麻烦,而且我无法弄清楚如何包含迭代以减少复制代码。它确实会生成上面的输出。
目前我的函数仅适用于包含 4 个单词的字符串。它还不包括迭代。
我的主要问题是:如何将迭代包含到我的函数中?我怎样才能让它适用于任意数量的单词?
add_phantom <- function(stuff){
strings <- c()
stuff <- str_split(stuff, ' ')
strings[1] <- str_c('"', stuff[[1]][[1]], ' "', ' * ',
'phantom("', str_c(stuff[[1]][[2]], stuff[[1]][[3]], stuff[[1]][[4]], sep = ' '), '")')
strings[2] <- str_c('phantom("', stuff[[1]][[1]], ' ")',
' * "', stuff[[1]][[2]], '" * ',
'phantom("', str_c(stuff[[1]][[3]], stuff[[1]][[4]], sep = ' '), '")')
strings[3] <- str_c('phantom("', str_c(stuff[[1]][[1]], stuff[[1]][[2]], sep = ' '), ' ")',
' * "', stuff[[1]][[3]], '" * ',
'phantom("', stuff[[1]][[4]], '")')
strings[4] <- str_c('phantom("', str_c(stuff[[1]][[1]], stuff[[1]][[2]], stuff[[1]][[3]], sep = ' '), ' ")',
' * "', stuff[[1]][[4]], '"')
return(strings)
}
这是一些屠夫工作,但它给出了预期的输出:):
input <- c('I go to school')
library(purrr)
inp <- c(list(NULL),strsplit(input," ")[[1]])
phantomize <- function(x,leftside = T){
if(length(x)==1) return("")
if(leftside)
ph <- paste0('phantom("',paste(x[-1],collapse=" "),' ") * ') else
ph <- paste0(' * phantom("',paste(x[-1],collapse=" "),'")')
ph
}
map(1:(length(inp)-1),
~paste0(phantomize(inp[1:.x]),
inp[[.x+1]],
phantomize(inp[(.x+1):length(inp)],F)))
# [[1]]
# [1] "I * phantom(\"go to school\")"
#
# [[2]]
# [1] "phantom(\"I \") * go * phantom(\"to school\")"
#
# [[3]]
# [1] "phantom(\"I go \") * to * phantom(\"school\")"
#
# [[4]]
# [1] "phantom(\"I go to \") * school"
这有点乱七八糟,但我认为它可以解决您的问题:
library(corpus)
input <- 'I go to school'
types <- text_types(input, collapse = TRUE) # all word types
(loc <- text_locate(input, types)) # locate all word types, get context
## text before instance after
## 1 1 I go to school
## 2 1 I go to school
## 3 1 I go to school
## 4 1 I go to school
return 值是一个数据框,具有 corpus_text
类型的列。这种方法看起来很疯狂,但它实际上并没有为 before
和 after
上下文(两者的类型都是 corpus_text
)
分配新的字符串
这是您想要的输出:
paste0("phantom(", loc$before, ") *", loc$instance, "* phantom(", loc$after, ")")
## [1] "phantom() *I* phantom( go to school)"
## [2] "phantom(I ) *go* phantom( to school)"
## [3] "phantom(I go ) *to* phantom( school)"
## [4] "phantom(I go to ) *school* phantom()"
如果你真的想变得疯狂而忽略标点符号:
phantomize <- function(input, ...) {
types <- text_types(input, collapse = TRUE, ...)
loc <- text_locate(input, types, ...)
paste0("phantom(", loc$before, ") *", loc$instance, "* phantom(",
loc$after, ")")
}
phantomize("I! go to school (?)...don't you?", drop_punct = TRUE)
## [1] "phantom() *I* phantom(! go to school (?)...don't you?)"
## [2] "phantom(I! ) *go* phantom( to school (?)...don't you?)"
## [3] "phantom(I! go ) *to* phantom( school (?)...don't you?)"
## [4] "phantom(I! go to ) *school* phantom( (?)...don't you?)"
## [5] "phantom(I! go to school (?)...) *don't* phantom( you?)"
## [6] "phantom(I! go to school (?)...don't ) *you* phantom(?)"
我会建议这样的事情
library(tidyverse)
library(glue)
test_string <- "i go to school"
str_split(test_string, " ") %>%
map(~str_split(test_string, .x, simplify = T)) %>%
flatten() %>%
map(str_trim) %>%
keep(~.x != "") %>%
map(~glue("phantom({string})", string = .x))
此代码片段可以很容易地在一个函数中实现,并将return以下输出。
[[1]]
phantom(i)
[[2]]
phantom(i go)
[[3]]
phantom(i go to)
[[4]]
phantom(go to school)
[[5]]
phantom(to school)
[[6]]
phantom(school)
我可能误解了您的问题——我不太确定您是否真的希望输出具有与示例输出中相同的格式。
我知道我的问题有点含糊,所以我有一个我正在尝试做的例子。
input <- c('I go to school')
#Output
'"I " * phantom("go to school")'
'phantom("I ") * "go" * phantom("to school")'
'phantom("I go ") * "to" * phantom("school")'
'phantom("I go to ") * "school"'
我已经编写了一个函数,但我在弄清楚如何使它适用于具有不同字数的字符串时遇到了很多麻烦,而且我无法弄清楚如何包含迭代以减少复制代码。它确实会生成上面的输出。
目前我的函数仅适用于包含 4 个单词的字符串。它还不包括迭代。
我的主要问题是:如何将迭代包含到我的函数中?我怎样才能让它适用于任意数量的单词?
add_phantom <- function(stuff){
strings <- c()
stuff <- str_split(stuff, ' ')
strings[1] <- str_c('"', stuff[[1]][[1]], ' "', ' * ',
'phantom("', str_c(stuff[[1]][[2]], stuff[[1]][[3]], stuff[[1]][[4]], sep = ' '), '")')
strings[2] <- str_c('phantom("', stuff[[1]][[1]], ' ")',
' * "', stuff[[1]][[2]], '" * ',
'phantom("', str_c(stuff[[1]][[3]], stuff[[1]][[4]], sep = ' '), '")')
strings[3] <- str_c('phantom("', str_c(stuff[[1]][[1]], stuff[[1]][[2]], sep = ' '), ' ")',
' * "', stuff[[1]][[3]], '" * ',
'phantom("', stuff[[1]][[4]], '")')
strings[4] <- str_c('phantom("', str_c(stuff[[1]][[1]], stuff[[1]][[2]], stuff[[1]][[3]], sep = ' '), ' ")',
' * "', stuff[[1]][[4]], '"')
return(strings)
}
这是一些屠夫工作,但它给出了预期的输出:):
input <- c('I go to school')
library(purrr)
inp <- c(list(NULL),strsplit(input," ")[[1]])
phantomize <- function(x,leftside = T){
if(length(x)==1) return("")
if(leftside)
ph <- paste0('phantom("',paste(x[-1],collapse=" "),' ") * ') else
ph <- paste0(' * phantom("',paste(x[-1],collapse=" "),'")')
ph
}
map(1:(length(inp)-1),
~paste0(phantomize(inp[1:.x]),
inp[[.x+1]],
phantomize(inp[(.x+1):length(inp)],F)))
# [[1]]
# [1] "I * phantom(\"go to school\")"
#
# [[2]]
# [1] "phantom(\"I \") * go * phantom(\"to school\")"
#
# [[3]]
# [1] "phantom(\"I go \") * to * phantom(\"school\")"
#
# [[4]]
# [1] "phantom(\"I go to \") * school"
这有点乱七八糟,但我认为它可以解决您的问题:
library(corpus)
input <- 'I go to school'
types <- text_types(input, collapse = TRUE) # all word types
(loc <- text_locate(input, types)) # locate all word types, get context
## text before instance after
## 1 1 I go to school
## 2 1 I go to school
## 3 1 I go to school
## 4 1 I go to school
return 值是一个数据框,具有 corpus_text
类型的列。这种方法看起来很疯狂,但它实际上并没有为 before
和 after
上下文(两者的类型都是 corpus_text
)
这是您想要的输出:
paste0("phantom(", loc$before, ") *", loc$instance, "* phantom(", loc$after, ")")
## [1] "phantom() *I* phantom( go to school)"
## [2] "phantom(I ) *go* phantom( to school)"
## [3] "phantom(I go ) *to* phantom( school)"
## [4] "phantom(I go to ) *school* phantom()"
如果你真的想变得疯狂而忽略标点符号:
phantomize <- function(input, ...) {
types <- text_types(input, collapse = TRUE, ...)
loc <- text_locate(input, types, ...)
paste0("phantom(", loc$before, ") *", loc$instance, "* phantom(",
loc$after, ")")
}
phantomize("I! go to school (?)...don't you?", drop_punct = TRUE)
## [1] "phantom() *I* phantom(! go to school (?)...don't you?)"
## [2] "phantom(I! ) *go* phantom( to school (?)...don't you?)"
## [3] "phantom(I! go ) *to* phantom( school (?)...don't you?)"
## [4] "phantom(I! go to ) *school* phantom( (?)...don't you?)"
## [5] "phantom(I! go to school (?)...) *don't* phantom( you?)"
## [6] "phantom(I! go to school (?)...don't ) *you* phantom(?)"
我会建议这样的事情
library(tidyverse)
library(glue)
test_string <- "i go to school"
str_split(test_string, " ") %>%
map(~str_split(test_string, .x, simplify = T)) %>%
flatten() %>%
map(str_trim) %>%
keep(~.x != "") %>%
map(~glue("phantom({string})", string = .x))
此代码片段可以很容易地在一个函数中实现,并将return以下输出。
[[1]]
phantom(i)
[[2]]
phantom(i go)
[[3]]
phantom(i go to)
[[4]]
phantom(go to school)
[[5]]
phantom(to school)
[[6]]
phantom(school)
我可能误解了您的问题——我不太确定您是否真的希望输出具有与示例输出中相同的格式。