如何根据另一列中的索引操作一列中的子字符串
How to manipulate substrings in one column based on indices in another column
我想根据存储在数据框另一列中的这些子字符串的 索引 来操作一列中的子字符串:
数据:
df_test
Turn c5 Turns_split
1 we 're not gon na know the person PNP VBB XX0 VVG TO0 VVI AT0 NN1 we, 're, not, gon, na, know, the, person
2 great answer AJ0 NN1 great, answer
3 it 's gon na rain PNP VBZ VVG TO0 VVI it, 's, gon, na, rain
c5_split Index
1 PNP, VBB, XX0, VVG, TO0, VVI, AT0, NN1 4
2 AJ0, NN1
3 PNP, VBZ, VVG, TO0, VVI 3
索引(值4
和3
)存储在列Index
中;我要操作的子字符串存储在 c5
中,其中包含词性标记。我想做的操作集中在 c5
中的两个子字符串:(i) 其索引与 Index
中的索引值相同的子字符串和 (ii) 此后的子字符串,即, Index
值 + 1 的子字符串。我要执行的操作是用 =
符号替换两个子字符串之间的空格。所以 列 c5
中的期望输出 是这样的:
df_text$c5
"PNP VBB XX0 VVG=TO0 VVI AT0 NN1" "AJ0 NN1" "PNP VBZ VVG=TO0 VVI"
我真的不知道该怎么做,因此非常感谢您的指导。
可重现的数据:
df_test <- structure(list(Turn = c("we 're not gon na know the person",
"great answer", "it 's gon na rain"), c5 = c("PNP VBB XX0 VVG TO0 VVI AT0 NN1",
"AJ0 NN1", "PNP VBZ VVG TO0 VVI"), Turns_split = list(c("we",
"'re", "not", "gon", "na", "know", "the", "person"), c("great",
"answer"), c("it", "'s", "gon", "na", "rain")), c5_split = list(
c("PNP", "VBB", "XX0", "VVG", "TO0", "VVI", "AT0", "NN1"),
c("AJ0", "NN1"), c("PNP", "VBZ", "VVG", "TO0", "VVI")), Index = list(
4L, integer(0), 3L)), row.names = c(NA, -3L), class = "data.frame")
试试这个
for(i in 1:nrow(df_test)){
if(length(df_test$Index[[i]])==0) next()
s = unlist(strsplit(df_test$c5[i],split = " "))
s[df_test$Index[[i]]] = paste0(s[df_test$Index[[i]]],"=",s[df_test$Index[[i]]+1])
df_test$c5[i] = paste(s[-(df_test$Index[[i]]+1)],collapse = " ")
}
我想根据存储在数据框另一列中的这些子字符串的 索引 来操作一列中的子字符串:
数据:
df_test
Turn c5 Turns_split
1 we 're not gon na know the person PNP VBB XX0 VVG TO0 VVI AT0 NN1 we, 're, not, gon, na, know, the, person
2 great answer AJ0 NN1 great, answer
3 it 's gon na rain PNP VBZ VVG TO0 VVI it, 's, gon, na, rain
c5_split Index
1 PNP, VBB, XX0, VVG, TO0, VVI, AT0, NN1 4
2 AJ0, NN1
3 PNP, VBZ, VVG, TO0, VVI 3
索引(值4
和3
)存储在列Index
中;我要操作的子字符串存储在 c5
中,其中包含词性标记。我想做的操作集中在 c5
中的两个子字符串:(i) 其索引与 Index
中的索引值相同的子字符串和 (ii) 此后的子字符串,即, Index
值 + 1 的子字符串。我要执行的操作是用 =
符号替换两个子字符串之间的空格。所以 列 c5
中的期望输出 是这样的:
df_text$c5
"PNP VBB XX0 VVG=TO0 VVI AT0 NN1" "AJ0 NN1" "PNP VBZ VVG=TO0 VVI"
我真的不知道该怎么做,因此非常感谢您的指导。
可重现的数据:
df_test <- structure(list(Turn = c("we 're not gon na know the person",
"great answer", "it 's gon na rain"), c5 = c("PNP VBB XX0 VVG TO0 VVI AT0 NN1",
"AJ0 NN1", "PNP VBZ VVG TO0 VVI"), Turns_split = list(c("we",
"'re", "not", "gon", "na", "know", "the", "person"), c("great",
"answer"), c("it", "'s", "gon", "na", "rain")), c5_split = list(
c("PNP", "VBB", "XX0", "VVG", "TO0", "VVI", "AT0", "NN1"),
c("AJ0", "NN1"), c("PNP", "VBZ", "VVG", "TO0", "VVI")), Index = list(
4L, integer(0), 3L)), row.names = c(NA, -3L), class = "data.frame")
试试这个
for(i in 1:nrow(df_test)){
if(length(df_test$Index[[i]])==0) next()
s = unlist(strsplit(df_test$c5[i],split = " "))
s[df_test$Index[[i]]] = paste0(s[df_test$Index[[i]]],"=",s[df_test$Index[[i]]+1])
df_test$c5[i] = paste(s[-(df_test$Index[[i]]+1)],collapse = " ")
}