将 pivot_longer 用于多列 类
Using pivot_longer with multiple column classes
我有一个具有这种结构的数据集(向调查受访者提出了很多问题),我想将其从宽改造成长:
library(tidyverse)
df_wide <-
tribble(
~resp_id, ~question_1_info, ~question_1_answer, ~question_2_info, ~question_2_answer,
1, "What is your eye color?", 1, "What is your hair color?", 2,
2, "Are you over 6 ft tall?", 1, "", NA,
3, "What is your hair color?", 0, "Are you under 40?", 1
)
这是我想要的输出:
df_long <-
tribble(
~resp_id, ~question_number, ~question_text, ~question_answer,
1, 1, "What is your eye color?", 1,
1, 2, "What is your hair color?", 2,
2, 1, "Are you over 6 ft tall?", 1,
2, 2, "", NA,
3, 1, "What is your hair color?", 0,
3, 2, "Are you under 40?", 1
)
我在让多个 类 列协同工作时遇到问题。这是我尝试过的:
df_wide %>%
pivot_longer(
cols = !resp_id,
names_to = c("question_number"),
names_prefix = "question_",
values_to = c("question_text", "question_answer")
)
我无法获得 names_to
或 names_prefix
和 values_to
的正确配置。
我们可以在重新排列列名中的子字符串后使用 names_pattern
library(dplyr)
library(tidyr)
library(stringr)
df_wide %>%
# rename the columns by rearranging the digits at the end
# "_(\d+)(_.*)" - captures the digits (\d+) after the _
# and the rest of the characters (_.*)
# replace with the backreference (\2, \1) of captured groups rearranged
rename_with(~ str_replace(., "_(\d+)(_.*)", "\2_\1"), -resp_id) %>%
pivot_longer(cols = -resp_id, names_to = c( ".value", "question_number"),
names_pattern = "(.*)_(\d+$)")
-输出
# A tibble: 6 × 4
resp_id question_number question_info question_answer
<dbl> <chr> <chr> <dbl>
1 1 1 "What is your eye color?" 1
2 1 2 "What is your hair color?" 2
3 2 1 "Are you over 6 ft tall?" 1
4 2 2 "" NA
5 3 1 "What is your hair color?" 0
6 3 2 "Are you under 40?" 1
我有一个具有这种结构的数据集(向调查受访者提出了很多问题),我想将其从宽改造成长:
library(tidyverse)
df_wide <-
tribble(
~resp_id, ~question_1_info, ~question_1_answer, ~question_2_info, ~question_2_answer,
1, "What is your eye color?", 1, "What is your hair color?", 2,
2, "Are you over 6 ft tall?", 1, "", NA,
3, "What is your hair color?", 0, "Are you under 40?", 1
)
这是我想要的输出:
df_long <-
tribble(
~resp_id, ~question_number, ~question_text, ~question_answer,
1, 1, "What is your eye color?", 1,
1, 2, "What is your hair color?", 2,
2, 1, "Are you over 6 ft tall?", 1,
2, 2, "", NA,
3, 1, "What is your hair color?", 0,
3, 2, "Are you under 40?", 1
)
我在让多个 类 列协同工作时遇到问题。这是我尝试过的:
df_wide %>%
pivot_longer(
cols = !resp_id,
names_to = c("question_number"),
names_prefix = "question_",
values_to = c("question_text", "question_answer")
)
我无法获得 names_to
或 names_prefix
和 values_to
的正确配置。
我们可以在重新排列列名中的子字符串后使用 names_pattern
library(dplyr)
library(tidyr)
library(stringr)
df_wide %>%
# rename the columns by rearranging the digits at the end
# "_(\d+)(_.*)" - captures the digits (\d+) after the _
# and the rest of the characters (_.*)
# replace with the backreference (\2, \1) of captured groups rearranged
rename_with(~ str_replace(., "_(\d+)(_.*)", "\2_\1"), -resp_id) %>%
pivot_longer(cols = -resp_id, names_to = c( ".value", "question_number"),
names_pattern = "(.*)_(\d+$)")
-输出
# A tibble: 6 × 4
resp_id question_number question_info question_answer
<dbl> <chr> <chr> <dbl>
1 1 1 "What is your eye color?" 1
2 1 2 "What is your hair color?" 2
3 2 1 "Are you over 6 ft tall?" 1
4 2 2 "" NA
5 3 1 "What is your hair color?" 0
6 3 2 "Are you under 40?" 1