如何识别列表中跨列的模式匹配索引

How to identify pattern-matching indices across columns in lists

我正在处理称为 turn 的语音数据及其词性标签,称为 c5:

df_test <- data.frame(
  Turn = c("we 're not gon na know the person",
           "it 's gon na rain"),
  c5 = c("PNP VBB XX0 VVG TO0 VVI AT0 NN1",
         "PNP VBZ VVG TO0 VVI"), stringsAsFactors = FALSE
)

我想识别对应于字符串 gonc5 值的索引;为此我 str_splitTurnc5 都变成了 'word units':

library(stringr)
df_test$Turns_split <- lapply(df_test$Turn, function(x) unlist(str_split(x, " ")))
df_test$c5_split <- lapply(df_test$c5, function(x) unlist(str_split(x, " ")))

这没问题。问题是匹配索引的识别:虽然我没有收到错误,但我没有得到所需的索引:

df_test$Index_matches <-  lapply(df_test[,3:4], function(x) match(which(df_test[,3]=="gon"), seq(df_test[,4])))
df_test
                               Turn                              c5                              Turns_split
1 we 're not gon na know the person PNP VBB XX0 VVG TO0 VVI AT0 NN1 we, 're, not, gon, na, know, the, person
2                 it 's gon na rain             PNP VBZ VVG TO0 VVI                    it, 's, gon, na, rain
                                c5_split Index_matches
1 PNP, VBB, XX0, VVG, TO0, VVI, AT0, NN1              
2                PNP, VBZ, VVG, TO0, VVI 

正确的结果是:

df_test
                               Turn                              c5                              Turns_split
1 we 're not gon na know the person PNP VBB XX0 VVG TO0 VVI AT0 NN1 we, 're, not, gon, na, know, the, person
2                 it 's gon na rain             PNP VBZ VVG TO0 VVI                    it, 's, gon, na, rain
                                c5_split Index_matches
1 PNP, VBB, XX0, VVG, TO0, VVI, AT0, NN1             4
2                PNP, VBZ, VVG, TO0, VVI             3

这个结果是怎么得来的?

BaseR,

df_test$Index_matches <-  sapply(df_test$Turns_split, function(x) which(x %in% "gon"))

> df_test$Index_matches 
[1] 4 3