使用正则表达式查找最后一个周期左侧和右侧的值并在 dplyr 中分开
Find values to the left and right of the last period with regex and separate in dplyr
我有一个列名如下的数据框:
[127] "quiz.32.player.submitted_answer_private" "quiz.32.player.rescue_event"
[129] "quiz.33.player.solution" "quiz.33.player.submitted_answer"
[131] "quiz.33.player.submitted_answer_private" "quiz.33.player.rescue_event"
[133] "partner_quiz.1.player.solution" "partner_quiz.1.player.submitted_answer"
[135] "partner_quiz.1.player.submitted_answer_private" "partner_quiz.1.player.rescue_event"
[137] "partner_quiz.2.player.solution" "partner_quiz.2.player.submitted_answer"
[139] "partner_quiz.2.player.submitted_answer_private" "partner_quiz.2.player.rescue_event"
我试图通过提取最后一个周期右侧的值和左侧的值来分离这些值。为此,我的 dplyr 管道如下:
frame <- data %>%
gather(k, value) %>%
separate(k, into = c("quiz_number", "suffix"), sep = "\.(?=player)")
出于某种原因,结果 data.frame 省略了所有以 "partner." 为前缀的列。有什么想法吗?
编辑:结果拆分应该在 quiz_number
列中包含最后一个句点左侧的所有内容(例如
quiz.32.player
and partner_quiz.2.player
) 和 "suffix" 列中最后一个句点右边的所有内容(例如 submitted_answer_private
and solution
)
而不是正则表达式环视中的 'player',对不是 .
的字符进行肯定匹配,直到字符串
的末尾 ($
)
library(dplyr)
library(tidyr)
data %>%
gather(k, value) %>%
separate(k, into = c("quiz_number", "suffix"), sep = "\.(?=[^.]+$)")
在 OP 的代码中,它匹配 'player' 字符串之前的 .
,但是在 'player' 之后有 .
,例如quiz.32.player.rescue_event
我有一个列名如下的数据框:
[127] "quiz.32.player.submitted_answer_private" "quiz.32.player.rescue_event"
[129] "quiz.33.player.solution" "quiz.33.player.submitted_answer"
[131] "quiz.33.player.submitted_answer_private" "quiz.33.player.rescue_event"
[133] "partner_quiz.1.player.solution" "partner_quiz.1.player.submitted_answer"
[135] "partner_quiz.1.player.submitted_answer_private" "partner_quiz.1.player.rescue_event"
[137] "partner_quiz.2.player.solution" "partner_quiz.2.player.submitted_answer"
[139] "partner_quiz.2.player.submitted_answer_private" "partner_quiz.2.player.rescue_event"
我试图通过提取最后一个周期右侧的值和左侧的值来分离这些值。为此,我的 dplyr 管道如下:
frame <- data %>%
gather(k, value) %>%
separate(k, into = c("quiz_number", "suffix"), sep = "\.(?=player)")
出于某种原因,结果 data.frame 省略了所有以 "partner." 为前缀的列。有什么想法吗?
编辑:结果拆分应该在 quiz_number
列中包含最后一个句点左侧的所有内容(例如
quiz.32.player
and partner_quiz.2.player
) 和 "suffix" 列中最后一个句点右边的所有内容(例如 submitted_answer_private
and solution
)
而不是正则表达式环视中的 'player',对不是 .
的字符进行肯定匹配,直到字符串
$
)
library(dplyr)
library(tidyr)
data %>%
gather(k, value) %>%
separate(k, into = c("quiz_number", "suffix"), sep = "\.(?=[^.]+$)")
在 OP 的代码中,它匹配 'player' 字符串之前的 .
,但是在 'player' 之后有 .
,例如quiz.32.player.rescue_event