在某些列中附加数据集

Append the dataset with data in certain columns

如果这是我的测试数据集

    Id    Col1_ABC   Col2_BCD    Col3_CBD     Col1_ABC_1    Col2_BCD_1    Col3_CBD_1
    1     Yes        No          No           Yes           Yes           No
    2     No         No          No           Yes           No            Yes
    3     Yes        Yes         Yes          No            Yes           No
    4     Yes        No          Yes
    5     No         No          No

我喜欢将带有尾随 _1 的列中的数据移动到没有尾随 _1 的数据 下方 < 如果这有意义 > .最终的数据集应该是这样的

    Id    Col1_ABC   Col2_BCD    Col3_CBD   Status   
    1     Yes        No          No         Pre  
    2     No         No          No         Pre  
    3     Yes        Yes         Yes        Pre  
    4     Yes        No          Yes        Pre
    5     No         No          No         Pre

    1     Yes        Yes         No         Post
    2     Yes        No          Yes        Post
    3     No         Yes         No         Post

我知道一种非常笨拙的方法来执行此操作,它涉及子设置、重命名和执行 rbind,但我正在寻找一种更有效的方法,非常感谢任何建议。

我们可以使用 tidyr 中的 pivot_longer 将列从 'wide' 重塑为 'long' =15=]

library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
    rename_at(-1, ~ str_replace(., '([A-Z])$', '\1_0')) %>% 
    pivot_longer(cols = -Id, names_to = c( ".value", "Status"),
         names_sep = "_(?=[0-9])", values_drop_na = TRUE) %>%
    mutate(Status = factor(Status, levels = 0:1, labels = c("Pre", "Post"))) %>%
    arrange(Status) 
#   Id Status Col1_ABC Col2_BCD Col3_CBD
#1  1    Pre      Yes       No       No
#2  2    Pre       No       No       No
#3  3    Pre      Yes      Yes      Yes
#4  4    Pre      Yes       No      Yes
#5  5    Pre       No       No       No
#6  1   Post      Yes      Yes       No
#7  2   Post      Yes       No      Yes
#8  3   Post       No      Yes       No

或者另一种选择是 melt from data.table 我们在 measure 中指定 patterns 将数据从 'wide' 转换为 'long' 格式

library(data.table)
melt(setDT(df1), measure = patterns("Col1_ABC", "Col2_BCD", "Col3_CBD"), 
    na.rm = TRUE, variable.name = 'Status',
   value.name = c("Col1_ABC", "Col2_BCD", "Col3_CBD"))[,
         Status := c("Pre", "Post")[Status]][]
#   Id Status Col1_ABC Col2_BCD Col3_CBD
#1:  1    Pre      Yes       No       No
#2:  2    Pre       No       No       No
#3:  3    Pre      Yes      Yes      Yes
#4:  4    Pre      Yes       No      Yes
#5:  5    Pre       No       No       No
#6:  1   Post      Yes      Yes       No
#7:  2   Post      Yes       No      Yes
#8:  3   Post       No      Yes       No

数据

df1 <- structure(list(Id = 1:5, Col1_ABC = c("Yes", "No", "Yes", "Yes", 
"No"), Col2_BCD = c("No", "No", "Yes", "No", "No"), Col3_CBD = c("No", 
"No", "Yes", "Yes", "No"), Col1_ABC_1 = c("Yes", "Yes", "No", 
NA, NA), Col2_BCD_1 = c("Yes", "No", "Yes", NA, NA), Col3_CBD_1 = c("No", 
"Yes", "No", NA, NA)), class = "data.frame", row.names = c(NA, 
-5L))

另一个涉及 dplyrtidyr 的选项可能是:

df %>%
 pivot_longer(-Id) %>%
 mutate(Status = if_else(grepl("_1", name, fixed = TRUE), "Post", "Pre"),
        name = gsub("^([^_]*_[^_]*)_.*$", "\1", name)) %>%
 pivot_wider(names_from = "name", values_from = "value") %>%
 filter_all(all_vars(!is.na(.))) %>%
 arrange(Status)

     Id Status Col1_ABC Col2_BCD Col3_CBD
  <int> <chr>  <chr>    <chr>    <chr>   
1     1 Post   Yes      Yes      No      
2     2 Post   Yes      No       Yes     
3     3 Post   No       Yes      No      
4     1 Pre    Yes      No       No      
5     2 Pre    No       No       No      
6     3 Pre    Yes      Yes      Yes     
7     4 Pre    Yes      No       Yes     
8     5 Pre    No       No       No