如何使用 R 删除字符串中特定单词前后的单词?
How to delete words before and after a specific word inside a string, using R?
我有以下 df:
structure(list(id = c(9L, 10L, 11L, 96L, 97L, 101L, 103L, 248L,
499L, 1044L), leg_activity = c("home, adpt, shop, car_passenger, home, adpt, work, adpt, home pt,, work pt,, outside, outside, outside pt,, outside pt,, pt, home",
"home pt,, pt, outside, outside, outside, outside pt,, pt, home, car, leisure, car, other, car, leisure, car, leisure, car, other, car, leisure, car, other, car, leisure, car, home, adpt, leisure, adpt, home",
"home pt,, work, adpt, home", "home, car, work, car, home pt,, work, adpt, home",
"home, adpt, work, car_passenger, leisure, car_passenger, work, adpt, home, car_passenger, outside, outside, outside, car_passenger, outside, outside, outside, car_passenger, home",
"home, bike, outside, outside, outside, car_passenger, outside, outside, outside, car_passenger, outside, outside, outside, bike, home, adpt, leisure, adpt, home, bike, leisure, bike, home",
"home, adpt, work, adpt, home, walk, other, pt, home", "home, adpt, work, walk, home, adpt, work, walk, home",
"home, adpt, leisure, adpt, home, bike, outside, outside, outside, bike, home",
"home, pt, work, adpt, home, adpt, work, adpt, home")), row.names = c(NA,
10L), class = "data.frame")
如您所见,leg_activity
列包含字符串。我想要的是删除所有与单词 outside
.
相关的单词
更具体一点,让我们以假设的行为例:
"home, bike, outside, outside, outside, car_passenger, outside, outside, bike, home, adpt, bike, leisure, bike, home"
目标是删除 outside
之前的单词以及 outside
之后的单词,最终,outside
也应该被删除。期望的输出:
"home, home, adpt, bike, leisure, bike, home"
到目前为止我只能删除特定的词
agents$leg_activity <- gsub(', home', '', agents$leg_activity)
非常感谢您的帮助!
我们可以用逗号分割字符串,使用 grep
获取 "outside"
所在的位置,并删除它前后的值。
agents$new_col <- sapply(strsplit(agents$leg_activity, ',{1,}\s'), function(x) {
inds <- grep('outside', x)
if(length(inds)) toString(x[-unique(c(inds - 1, inds, inds + 1))])
else toString(x)
})
agents$new_col
# [1] "home, adpt, shop, car_passenger, home, adpt, work, adpt, home pt, home"
# [2] "home pt, home, car, leisure, car, other, car, leisure, car, leisure, car, other, car, leisure, car, other, car, leisure, car, home, adpt, leisure, adpt, home"
# [3] "home pt, work, adpt, home"
# [4] "home, car, work, car, home pt, work, adpt, home"
# [5] "home, adpt, work, car_passenger, leisure, car_passenger, work, adpt, home, home"
# [6] "home, home, adpt, leisure, adpt, home, bike, leisure, bike, home"
# [7] "home, adpt, work, adpt, home, walk, other, pt, home"
# [8] "home, adpt, work, walk, home, adpt, work, walk, home"
# [9] "home, adpt, leisure, adpt, home, home"
#[10] "home, pt, work, adpt, home, adpt, work, adpt, home"
我有以下 df:
structure(list(id = c(9L, 10L, 11L, 96L, 97L, 101L, 103L, 248L,
499L, 1044L), leg_activity = c("home, adpt, shop, car_passenger, home, adpt, work, adpt, home pt,, work pt,, outside, outside, outside pt,, outside pt,, pt, home",
"home pt,, pt, outside, outside, outside, outside pt,, pt, home, car, leisure, car, other, car, leisure, car, leisure, car, other, car, leisure, car, other, car, leisure, car, home, adpt, leisure, adpt, home",
"home pt,, work, adpt, home", "home, car, work, car, home pt,, work, adpt, home",
"home, adpt, work, car_passenger, leisure, car_passenger, work, adpt, home, car_passenger, outside, outside, outside, car_passenger, outside, outside, outside, car_passenger, home",
"home, bike, outside, outside, outside, car_passenger, outside, outside, outside, car_passenger, outside, outside, outside, bike, home, adpt, leisure, adpt, home, bike, leisure, bike, home",
"home, adpt, work, adpt, home, walk, other, pt, home", "home, adpt, work, walk, home, adpt, work, walk, home",
"home, adpt, leisure, adpt, home, bike, outside, outside, outside, bike, home",
"home, pt, work, adpt, home, adpt, work, adpt, home")), row.names = c(NA,
10L), class = "data.frame")
如您所见,leg_activity
列包含字符串。我想要的是删除所有与单词 outside
.
更具体一点,让我们以假设的行为例:
"home, bike, outside, outside, outside, car_passenger, outside, outside, bike, home, adpt, bike, leisure, bike, home"
目标是删除 outside
之前的单词以及 outside
之后的单词,最终,outside
也应该被删除。期望的输出:
"home, home, adpt, bike, leisure, bike, home"
到目前为止我只能删除特定的词
agents$leg_activity <- gsub(', home', '', agents$leg_activity)
非常感谢您的帮助!
我们可以用逗号分割字符串,使用 grep
获取 "outside"
所在的位置,并删除它前后的值。
agents$new_col <- sapply(strsplit(agents$leg_activity, ',{1,}\s'), function(x) {
inds <- grep('outside', x)
if(length(inds)) toString(x[-unique(c(inds - 1, inds, inds + 1))])
else toString(x)
})
agents$new_col
# [1] "home, adpt, shop, car_passenger, home, adpt, work, adpt, home pt, home"
# [2] "home pt, home, car, leisure, car, other, car, leisure, car, leisure, car, other, car, leisure, car, other, car, leisure, car, home, adpt, leisure, adpt, home"
# [3] "home pt, work, adpt, home"
# [4] "home, car, work, car, home pt, work, adpt, home"
# [5] "home, adpt, work, car_passenger, leisure, car_passenger, work, adpt, home, home"
# [6] "home, home, adpt, leisure, adpt, home, bike, leisure, bike, home"
# [7] "home, adpt, work, adpt, home, walk, other, pt, home"
# [8] "home, adpt, work, walk, home, adpt, work, walk, home"
# [9] "home, adpt, leisure, adpt, home, home"
#[10] "home, pt, work, adpt, home, adpt, work, adpt, home"