在某些列中附加数据集
Append the dataset with data in certain columns
如果这是我的测试数据集
Id Col1_ABC Col2_BCD Col3_CBD Col1_ABC_1 Col2_BCD_1 Col3_CBD_1
1 Yes No No Yes Yes No
2 No No No Yes No Yes
3 Yes Yes Yes No Yes No
4 Yes No Yes
5 No No No
我喜欢将带有尾随 _1 的列中的数据移动到没有尾随 _1 的数据 下方 < 如果这有意义 > .最终的数据集应该是这样的
Id Col1_ABC Col2_BCD Col3_CBD Status
1 Yes No No Pre
2 No No No Pre
3 Yes Yes Yes Pre
4 Yes No Yes Pre
5 No No No Pre
1 Yes Yes No Post
2 Yes No Yes Post
3 No Yes No Post
我知道一种非常笨拙的方法来执行此操作,它涉及子设置、重命名和执行 rbind,但我正在寻找一种更有效的方法,非常感谢任何建议。
我们可以使用 tidyr
中的 pivot_longer
将列从 'wide' 重塑为 'long' =15=]
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
rename_at(-1, ~ str_replace(., '([A-Z])$', '\1_0')) %>%
pivot_longer(cols = -Id, names_to = c( ".value", "Status"),
names_sep = "_(?=[0-9])", values_drop_na = TRUE) %>%
mutate(Status = factor(Status, levels = 0:1, labels = c("Pre", "Post"))) %>%
arrange(Status)
# Id Status Col1_ABC Col2_BCD Col3_CBD
#1 1 Pre Yes No No
#2 2 Pre No No No
#3 3 Pre Yes Yes Yes
#4 4 Pre Yes No Yes
#5 5 Pre No No No
#6 1 Post Yes Yes No
#7 2 Post Yes No Yes
#8 3 Post No Yes No
或者另一种选择是 melt
from data.table
我们在 measure
中指定 patterns
将数据从 'wide' 转换为 'long' 格式
library(data.table)
melt(setDT(df1), measure = patterns("Col1_ABC", "Col2_BCD", "Col3_CBD"),
na.rm = TRUE, variable.name = 'Status',
value.name = c("Col1_ABC", "Col2_BCD", "Col3_CBD"))[,
Status := c("Pre", "Post")[Status]][]
# Id Status Col1_ABC Col2_BCD Col3_CBD
#1: 1 Pre Yes No No
#2: 2 Pre No No No
#3: 3 Pre Yes Yes Yes
#4: 4 Pre Yes No Yes
#5: 5 Pre No No No
#6: 1 Post Yes Yes No
#7: 2 Post Yes No Yes
#8: 3 Post No Yes No
数据
df1 <- structure(list(Id = 1:5, Col1_ABC = c("Yes", "No", "Yes", "Yes",
"No"), Col2_BCD = c("No", "No", "Yes", "No", "No"), Col3_CBD = c("No",
"No", "Yes", "Yes", "No"), Col1_ABC_1 = c("Yes", "Yes", "No",
NA, NA), Col2_BCD_1 = c("Yes", "No", "Yes", NA, NA), Col3_CBD_1 = c("No",
"Yes", "No", NA, NA)), class = "data.frame", row.names = c(NA,
-5L))
另一个涉及 dplyr
和 tidyr
的选项可能是:
df %>%
pivot_longer(-Id) %>%
mutate(Status = if_else(grepl("_1", name, fixed = TRUE), "Post", "Pre"),
name = gsub("^([^_]*_[^_]*)_.*$", "\1", name)) %>%
pivot_wider(names_from = "name", values_from = "value") %>%
filter_all(all_vars(!is.na(.))) %>%
arrange(Status)
Id Status Col1_ABC Col2_BCD Col3_CBD
<int> <chr> <chr> <chr> <chr>
1 1 Post Yes Yes No
2 2 Post Yes No Yes
3 3 Post No Yes No
4 1 Pre Yes No No
5 2 Pre No No No
6 3 Pre Yes Yes Yes
7 4 Pre Yes No Yes
8 5 Pre No No No
如果这是我的测试数据集
Id Col1_ABC Col2_BCD Col3_CBD Col1_ABC_1 Col2_BCD_1 Col3_CBD_1
1 Yes No No Yes Yes No
2 No No No Yes No Yes
3 Yes Yes Yes No Yes No
4 Yes No Yes
5 No No No
我喜欢将带有尾随 _1 的列中的数据移动到没有尾随 _1 的数据 下方 < 如果这有意义 > .最终的数据集应该是这样的
Id Col1_ABC Col2_BCD Col3_CBD Status
1 Yes No No Pre
2 No No No Pre
3 Yes Yes Yes Pre
4 Yes No Yes Pre
5 No No No Pre
1 Yes Yes No Post
2 Yes No Yes Post
3 No Yes No Post
我知道一种非常笨拙的方法来执行此操作,它涉及子设置、重命名和执行 rbind,但我正在寻找一种更有效的方法,非常感谢任何建议。
我们可以使用 tidyr
中的 pivot_longer
将列从 'wide' 重塑为 'long' =15=]
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
rename_at(-1, ~ str_replace(., '([A-Z])$', '\1_0')) %>%
pivot_longer(cols = -Id, names_to = c( ".value", "Status"),
names_sep = "_(?=[0-9])", values_drop_na = TRUE) %>%
mutate(Status = factor(Status, levels = 0:1, labels = c("Pre", "Post"))) %>%
arrange(Status)
# Id Status Col1_ABC Col2_BCD Col3_CBD
#1 1 Pre Yes No No
#2 2 Pre No No No
#3 3 Pre Yes Yes Yes
#4 4 Pre Yes No Yes
#5 5 Pre No No No
#6 1 Post Yes Yes No
#7 2 Post Yes No Yes
#8 3 Post No Yes No
或者另一种选择是 melt
from data.table
我们在 measure
中指定 patterns
将数据从 'wide' 转换为 'long' 格式
library(data.table)
melt(setDT(df1), measure = patterns("Col1_ABC", "Col2_BCD", "Col3_CBD"),
na.rm = TRUE, variable.name = 'Status',
value.name = c("Col1_ABC", "Col2_BCD", "Col3_CBD"))[,
Status := c("Pre", "Post")[Status]][]
# Id Status Col1_ABC Col2_BCD Col3_CBD
#1: 1 Pre Yes No No
#2: 2 Pre No No No
#3: 3 Pre Yes Yes Yes
#4: 4 Pre Yes No Yes
#5: 5 Pre No No No
#6: 1 Post Yes Yes No
#7: 2 Post Yes No Yes
#8: 3 Post No Yes No
数据
df1 <- structure(list(Id = 1:5, Col1_ABC = c("Yes", "No", "Yes", "Yes",
"No"), Col2_BCD = c("No", "No", "Yes", "No", "No"), Col3_CBD = c("No",
"No", "Yes", "Yes", "No"), Col1_ABC_1 = c("Yes", "Yes", "No",
NA, NA), Col2_BCD_1 = c("Yes", "No", "Yes", NA, NA), Col3_CBD_1 = c("No",
"Yes", "No", NA, NA)), class = "data.frame", row.names = c(NA,
-5L))
另一个涉及 dplyr
和 tidyr
的选项可能是:
df %>%
pivot_longer(-Id) %>%
mutate(Status = if_else(grepl("_1", name, fixed = TRUE), "Post", "Pre"),
name = gsub("^([^_]*_[^_]*)_.*$", "\1", name)) %>%
pivot_wider(names_from = "name", values_from = "value") %>%
filter_all(all_vars(!is.na(.))) %>%
arrange(Status)
Id Status Col1_ABC Col2_BCD Col3_CBD
<int> <chr> <chr> <chr> <chr>
1 1 Post Yes Yes No
2 2 Post Yes No Yes
3 3 Post No Yes No
4 1 Pre Yes No No
5 2 Pre No No No
6 3 Pre Yes Yes Yes
7 4 Pre Yes No Yes
8 5 Pre No No No