case_when 在 dplyr R 中有多个条件

Question

我有一个 data.frame 看起来像这样

df <-data.frame(Day=c(0,0,0,1,1,1),type=c("tr1","tr2","ctrl","tr1","tr2","ctrl"),
                mean=c(0.211,0203,0.199,0.119,0.001,0.254), 
                sd=c(0.07,0.141,0.096, 0.0848, 0.0006, 0.0474))

  Day type    mean     sd
1   0  tr1   0.211 0.0700
2   0  tr2 203.000 0.1410
3   0 ctrl   0.199 0.0960
4   1  tr1   0.119 0.0848
5   1  tr2   0.001 0.0006
6   1 ctrl   0.254 0.0474

首先，我想根据日期 group_by（日期）对我的数据框进行分组。当在每组中，每种类型(tr1, tr2)的总和(mean + sd)大于控制（ctrl）的差异（平均值 - sd）然后我想在新列（new.col）中分配值 ~yes 如果不是我想分配值 ~no.

例如，我希望我的数据看起来像这样。它不必看起来像这样

  Day type    mean     sd new.col
1   0  tr1   0.211 0.0700  yes
2   0  tr2 203.000 0.1410  yes
3   0 ctrl   0.199 0.0960  NA
4   1  tr1   0.119 0.0848  NO
5   1  tr2   0.001 0.0006  N0
6   1 ctrl   0.254 0.0474  NA

Answer 1

按 'Day' 分组后，一种选择是对 'mean'、'sd' 值进行子集化，其中 'type' 不是 (!=) "ctrl ", 添加 (+) 列, 得到 sum, 检查它是否大于 (>) 对应的 'mean', 'sd' 的相加值其中 'type' 是 'ctrl'。通过加 1 将逻辑索引转换为数字索引，将其用于替换值向量 (c("NO", "Yes"))。最后将 'type' 为“ctrl”的行更改为 NA 和 case_when

library(dplyr)
df %>% 
    group_by(Day) %>% 
    mutate(new.col = case_when(type == "ctrl" ~ NA_character_, 
     TRUE ~ c("NO", "Yes")[1 + (sum(mean[type != "ctrl"] + 
      sd[type != "ctrl" ]) >  (mean[type == 'ctrl'] - sd[type == 'ctrl']))])) %>%
    ungroup

-输出

# A tibble: 6 x 5
    Day type     mean     sd new.col
  <dbl> <chr>   <dbl>  <dbl> <chr>  
1     0 tr1     0.211 0.07   Yes    
2     0 tr2   203     0.141  Yes    
3     0 ctrl    0.199 0.096  <NA>   
4     1 tr1     0.119 0.0848 NO     
5     1 tr2     0.001 0.0006 NO     
6     1 ctrl    0.254 0.0474 <NA>

Answer 2

dplyr 的另一种选择是：

library(dplyr)

df %>% 
  dplyr::left_join(df %>% dplyr::filter(type == "ctrl"), by = "Day", suffix = c("_t", "_c")) %>%
  dplyr::group_by(Day, type_t) %>%
  dplyr::mutate(new.col = case_when(type_t == "ctrl" ~ NA_character_,
                                   sum(mean_t + sd_t) > (mean(mean_c -sd_c)) ~ "yes",
                                   TRUE ~ "no")) %>%
  dplyr::ungroup() %>%
  dplyr::select(Day, type = type_t, mean = mean_t, sd = sd_t, new.col)

# A tibble: 6 x 5
    Day type     mean     sd new.col
  <dbl> <chr>   <dbl>  <dbl> <chr>  
1     0 tr1     0.211 0.07   yes    
2     0 tr2   203     0.141  yes    
3     0 ctrl    0.199 0.096  NA     
4     1 tr1     0.119 0.0848 no     
5     1 tr2     0.001 0.0006 no     
6     1 ctrl    0.254 0.0474 NA

case_when 在 dplyr R 中有多个条件

case_when with multiple conditions in dplyr R

datatable

r

dplyr

tidyverse