如何识别列表中跨列的模式匹配索引
How to identify pattern-matching indices across columns in lists
我正在处理称为 turn
的语音数据及其词性标签,称为 c5
:
df_test <- data.frame(
Turn = c("we 're not gon na know the person",
"it 's gon na rain"),
c5 = c("PNP VBB XX0 VVG TO0 VVI AT0 NN1",
"PNP VBZ VVG TO0 VVI"), stringsAsFactors = FALSE
)
我想识别对应于字符串 gon
的 c5
值的索引;为此我 str_split
把 Turn
和 c5
都变成了 'word units':
library(stringr)
df_test$Turns_split <- lapply(df_test$Turn, function(x) unlist(str_split(x, " ")))
df_test$c5_split <- lapply(df_test$c5, function(x) unlist(str_split(x, " ")))
这没问题。问题是匹配索引的识别:虽然我没有收到错误,但我没有得到所需的索引:
df_test$Index_matches <- lapply(df_test[,3:4], function(x) match(which(df_test[,3]=="gon"), seq(df_test[,4])))
df_test
Turn c5 Turns_split
1 we 're not gon na know the person PNP VBB XX0 VVG TO0 VVI AT0 NN1 we, 're, not, gon, na, know, the, person
2 it 's gon na rain PNP VBZ VVG TO0 VVI it, 's, gon, na, rain
c5_split Index_matches
1 PNP, VBB, XX0, VVG, TO0, VVI, AT0, NN1
2 PNP, VBZ, VVG, TO0, VVI
正确的结果是:
df_test
Turn c5 Turns_split
1 we 're not gon na know the person PNP VBB XX0 VVG TO0 VVI AT0 NN1 we, 're, not, gon, na, know, the, person
2 it 's gon na rain PNP VBZ VVG TO0 VVI it, 's, gon, na, rain
c5_split Index_matches
1 PNP, VBB, XX0, VVG, TO0, VVI, AT0, NN1 4
2 PNP, VBZ, VVG, TO0, VVI 3
这个结果是怎么得来的?
和BaseR
,
df_test$Index_matches <- sapply(df_test$Turns_split, function(x) which(x %in% "gon"))
> df_test$Index_matches
[1] 4 3
我正在处理称为 turn
的语音数据及其词性标签,称为 c5
:
df_test <- data.frame(
Turn = c("we 're not gon na know the person",
"it 's gon na rain"),
c5 = c("PNP VBB XX0 VVG TO0 VVI AT0 NN1",
"PNP VBZ VVG TO0 VVI"), stringsAsFactors = FALSE
)
我想识别对应于字符串 gon
的 c5
值的索引;为此我 str_split
把 Turn
和 c5
都变成了 'word units':
library(stringr)
df_test$Turns_split <- lapply(df_test$Turn, function(x) unlist(str_split(x, " ")))
df_test$c5_split <- lapply(df_test$c5, function(x) unlist(str_split(x, " ")))
这没问题。问题是匹配索引的识别:虽然我没有收到错误,但我没有得到所需的索引:
df_test$Index_matches <- lapply(df_test[,3:4], function(x) match(which(df_test[,3]=="gon"), seq(df_test[,4])))
df_test
Turn c5 Turns_split
1 we 're not gon na know the person PNP VBB XX0 VVG TO0 VVI AT0 NN1 we, 're, not, gon, na, know, the, person
2 it 's gon na rain PNP VBZ VVG TO0 VVI it, 's, gon, na, rain
c5_split Index_matches
1 PNP, VBB, XX0, VVG, TO0, VVI, AT0, NN1
2 PNP, VBZ, VVG, TO0, VVI
正确的结果是:
df_test
Turn c5 Turns_split
1 we 're not gon na know the person PNP VBB XX0 VVG TO0 VVI AT0 NN1 we, 're, not, gon, na, know, the, person
2 it 's gon na rain PNP VBZ VVG TO0 VVI it, 's, gon, na, rain
c5_split Index_matches
1 PNP, VBB, XX0, VVG, TO0, VVI, AT0, NN1 4
2 PNP, VBZ, VVG, TO0, VVI 3
这个结果是怎么得来的?
和BaseR
,
df_test$Index_matches <- sapply(df_test$Turns_split, function(x) which(x %in% "gon"))
> df_test$Index_matches
[1] 4 3