有没有办法用特定来源的数字替换向量中的单词

Question

我一直在努力用特定数字替换 Comments 变量 (data1) 中的单词。例如，每当有“卡车”这个词时，它就会被“1”代替。每当有“预告片”这个词时，它就会被“2”取代……等等。我终于可以得到以下两个数据集。第一个是我的评论，第二个是所有带数字的重要单词。我现在需要的是用 data2 中的相应编号替换 data1 中评论中的每个单词。

data1<- structure(list(Direction = c("W", "W", "E"), Comments = list(
    "tractor trailer struck by car that left scene // delayed response due to previous assists // ma unit called off incident", 
    "incident cleared before ma unit arrived on scene", "crash on union south of i-70 // 3 city tow on scene. // no state damage  // all cleared from roadway ")), row.names = 3:5, class = "data.frame")


data2<- structure(list(Content = c("tractor", "trailer", "struck", "car", 
"left", "scene", "delayed", "response", "due", "previous", "assists", 
"unit", "called", "incident", "cleared", "arrived", "crash", 
"union", "south", "i-70", "city", "tow", "state", "damage", "roadway"
), number = 1:25), row.names = c(NA, 25L), class = "data.frame")

我的最终目标是看评论，而不是看数字或文字。提前感谢您的帮助。

Answer 1

这是 gsub 的方法。

for(i in seq_len(nrow(data2))) {
  pat <- paste0("\<", data2$Content[i], "\>")
  data1$Comments <- gsub(pat, data2$number[i], data1$Comments) 
}
data1
#>   Direction                                                           Comments
#> 3         W           1 2 3 by 4 that 5 6 // 7 8 9 to 10 11 // ma 12 13 off 14
#> 4         W                                         14 15 before ma 12 16 on 6
#> 5         E 17 on 18 19 of 20 // 3 21 22 on 6. // no 23 24  // all 15 from 25

^{由 reprex package (v2.0.1)}

于 2022-04-28 创建

有没有办法用特定来源的数字替换向量中的单词

Is there a way to replace the words in a vector by numbers from a specific source

string

text

nlp

r

data-manipulation