如何根据多列名称分隔符从宽格式转换为长格式?
How to pivot from wide to long format based on multiple column name separators?
我知道有类似的帖子,但我对如何使用 pivot_longer
将我自己的数据从宽格式转换为长格式有点困惑。下面的代码创建了一个结构类似于我的真实数据的模拟数据集。
library(tidyverse)
## Dummy data.
# ID Variables.
part <- rep(rep(paste0("P", c(1:2)), each = 20, times = 2))
type <- rep(c("pre", "post"), each = 10, times = 4)
sp <- rep(c("slow", "mod"), each = 40)
# Values
var1_site1_L <- rep(c(1, NA), each = 5, times = 8)
var1_site1_R <- rep(c(1, NA), each = 5, times = 8)
var1_site1_ALL <- rep(1, times = 80)
var1_site1_ALL_M <- rep(c(1, rep(NA, times = 9)), times = 8)
var2_site2_L <- rep(c(1, NA), each = 5, times = 8)
var2_site2_R <- rep(c(1, NA), each = 5, times = 8)
var2_site2_ALL <- rep(1, times = 80)
var2_site2_ALL_M <- rep(c(1, rep(NA, times = 9)), times = 8)
dat <- data.frame(part, type, sp, var1_site1_L, var1_site1_R, var1_site1_ALL,
var1_site1_ALL_M, var2_site2_L, var2_site2_R, var2_site2_ALL,
var2_site2_ALL_M)
我希望能够将变量 part
、type
和 sp
保留为 ID 变量,但添加唯一的列名称分隔符作为具有特定值的附加 ID 变量在最后一栏。例如,我希望结果类似于 (注意这只是一个非常基本的例子,当然,还会有更多的观察结果,包括那些 NA 值值列):
par type sp var site side misc value
p1 pre slow var1 site1 L NA 1
p1 pre slow var1 site1 R NA 1
p1 pre slow var1 site1 ALL NA 1
p1 pre slow var1 site1 ALL M 1
我知道这是一个非常独特的数据结构。在某些情况下(每个 ID 变量只有一个值的情况),我特别关注如何处理第四列名称分隔符 (M
)。
我得到了下面的代码,我知道如果我要实现我想要的结果,我需要做一些工作。
long <- dat %>%
pivot_longer(cols = c(1:3),
names_to = c("var", "site", "side", "misc"),
names_sep = "_")
任何帮助将不胜感激!
我不认为你可以用 pivot_longer 到达那里,但试试这个。
library(stringr)
results <- data.frame()
for (x in 4:length(dat)){
names <- names(dat[,c(1:3,x)])
res <- dat %>%
mutate(id = 1:nrow(dat)) %>%
select(id, names) %>%
mutate(var = str_extract(names[4],"var\d"),
site = str_extract(names[4],"site\d"),
side = str_extract(names[4],"L|R|ALL"),
misc = str_extract(names[4],"[M]"),
misc = ifelse(is.na(misc), "NA", misc)) %>%
rename("value" = 5) %>%
select(id, part, type, sp, var, site, side, misc, value)
results <- rbind(results, res)
}
head(results %>% arrange(id) %>% select(-id))
part type sp var site side misc value
1 P1 pre slow var1 site1 L NA 1
2 P1 pre slow var1 site1 R NA 1
3 P1 pre slow var1 site1 ALL NA 1
4 P1 pre slow var1 site1 ALL M 1
5 P1 pre slow var2 site2 L NA 1
6 P1 pre slow var2 site2 R NA 1
dat %>%
pivot_longer(starts_with('var')) %>%
separate(name, c('var', 'site', 'side', 'misc'), fill = 'right')
# A tibble: 640 x 8
part type sp var site side misc value
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 P1 pre slow var1 site1 L NA 1
2 P1 pre slow var1 site1 R NA 1
3 P1 pre slow var1 site1 ALL NA 1
4 P1 pre slow var1 site1 ALL M 1
5 P1 pre slow var2 site2 L NA 1
6 P1 pre slow var2 site2 R NA 1
7 P1 pre slow var2 site2 ALL NA 1
8 P1 pre slow var2 site2 ALL M 1
9 P1 pre slow var1 site1 L NA 1
10 P1 pre slow var1 site1 R NA 1
# ... with 630 more rows
我试验了我在早期解决方案中产生的结果,然后 pivot_wider 然后 pivot_longer 发现如何使其适用于 pivot_longer.Your 原始方法非常接近。
dat %>%
pivot_longer(
cols = !c(part, type, sp),
names_to = c("var", "site", "side", "misc"),
names_sep = "_",
values_to = "value"
)
part type sp var site side misc value
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 P1 pre slow var1 site1 L NA 1
2 P1 pre slow var1 site1 R NA 1
3 P1 pre slow var1 site1 ALL NA 1
4 P1 pre slow var1 site1 ALL M 1
5 P1 pre slow var2 site2 L NA 1
6 P1 pre slow var2 site2 R NA 1
7 P1 pre slow var2 site2 ALL NA 1
8 P1 pre slow var2 site2 ALL M 1
9 P1 pre slow var1 site1 L NA 1
10 P1 pre slow var1 site1 R NA 1
我知道有类似的帖子,但我对如何使用 pivot_longer
将我自己的数据从宽格式转换为长格式有点困惑。下面的代码创建了一个结构类似于我的真实数据的模拟数据集。
library(tidyverse)
## Dummy data.
# ID Variables.
part <- rep(rep(paste0("P", c(1:2)), each = 20, times = 2))
type <- rep(c("pre", "post"), each = 10, times = 4)
sp <- rep(c("slow", "mod"), each = 40)
# Values
var1_site1_L <- rep(c(1, NA), each = 5, times = 8)
var1_site1_R <- rep(c(1, NA), each = 5, times = 8)
var1_site1_ALL <- rep(1, times = 80)
var1_site1_ALL_M <- rep(c(1, rep(NA, times = 9)), times = 8)
var2_site2_L <- rep(c(1, NA), each = 5, times = 8)
var2_site2_R <- rep(c(1, NA), each = 5, times = 8)
var2_site2_ALL <- rep(1, times = 80)
var2_site2_ALL_M <- rep(c(1, rep(NA, times = 9)), times = 8)
dat <- data.frame(part, type, sp, var1_site1_L, var1_site1_R, var1_site1_ALL,
var1_site1_ALL_M, var2_site2_L, var2_site2_R, var2_site2_ALL,
var2_site2_ALL_M)
我希望能够将变量 part
、type
和 sp
保留为 ID 变量,但添加唯一的列名称分隔符作为具有特定值的附加 ID 变量在最后一栏。例如,我希望结果类似于 (注意这只是一个非常基本的例子,当然,还会有更多的观察结果,包括那些 NA 值值列):
par type sp var site side misc value
p1 pre slow var1 site1 L NA 1
p1 pre slow var1 site1 R NA 1
p1 pre slow var1 site1 ALL NA 1
p1 pre slow var1 site1 ALL M 1
我知道这是一个非常独特的数据结构。在某些情况下(每个 ID 变量只有一个值的情况),我特别关注如何处理第四列名称分隔符 (M
)。
我得到了下面的代码,我知道如果我要实现我想要的结果,我需要做一些工作。
long <- dat %>%
pivot_longer(cols = c(1:3),
names_to = c("var", "site", "side", "misc"),
names_sep = "_")
任何帮助将不胜感激!
我不认为你可以用 pivot_longer 到达那里,但试试这个。
library(stringr)
results <- data.frame()
for (x in 4:length(dat)){
names <- names(dat[,c(1:3,x)])
res <- dat %>%
mutate(id = 1:nrow(dat)) %>%
select(id, names) %>%
mutate(var = str_extract(names[4],"var\d"),
site = str_extract(names[4],"site\d"),
side = str_extract(names[4],"L|R|ALL"),
misc = str_extract(names[4],"[M]"),
misc = ifelse(is.na(misc), "NA", misc)) %>%
rename("value" = 5) %>%
select(id, part, type, sp, var, site, side, misc, value)
results <- rbind(results, res)
}
head(results %>% arrange(id) %>% select(-id))
part type sp var site side misc value
1 P1 pre slow var1 site1 L NA 1
2 P1 pre slow var1 site1 R NA 1
3 P1 pre slow var1 site1 ALL NA 1
4 P1 pre slow var1 site1 ALL M 1
5 P1 pre slow var2 site2 L NA 1
6 P1 pre slow var2 site2 R NA 1
dat %>%
pivot_longer(starts_with('var')) %>%
separate(name, c('var', 'site', 'side', 'misc'), fill = 'right')
# A tibble: 640 x 8
part type sp var site side misc value
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 P1 pre slow var1 site1 L NA 1
2 P1 pre slow var1 site1 R NA 1
3 P1 pre slow var1 site1 ALL NA 1
4 P1 pre slow var1 site1 ALL M 1
5 P1 pre slow var2 site2 L NA 1
6 P1 pre slow var2 site2 R NA 1
7 P1 pre slow var2 site2 ALL NA 1
8 P1 pre slow var2 site2 ALL M 1
9 P1 pre slow var1 site1 L NA 1
10 P1 pre slow var1 site1 R NA 1
# ... with 630 more rows
我试验了我在早期解决方案中产生的结果,然后 pivot_wider 然后 pivot_longer 发现如何使其适用于 pivot_longer.Your 原始方法非常接近。
dat %>%
pivot_longer(
cols = !c(part, type, sp),
names_to = c("var", "site", "side", "misc"),
names_sep = "_",
values_to = "value"
)
part type sp var site side misc value
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
1 P1 pre slow var1 site1 L NA 1
2 P1 pre slow var1 site1 R NA 1
3 P1 pre slow var1 site1 ALL NA 1
4 P1 pre slow var1 site1 ALL M 1
5 P1 pre slow var2 site2 L NA 1
6 P1 pre slow var2 site2 R NA 1
7 P1 pre slow var2 site2 ALL NA 1
8 P1 pre slow var2 site2 ALL M 1
9 P1 pre slow var1 site1 L NA 1
10 P1 pre slow var1 site1 R NA 1