根据 R 中不同数据框输入的多个条件创建列类型

Question

我有一个这样的数据框

id <- c(100,101,102,103,104,105,106,107,108,109,110)
state_code <- c("CA","CA","CA","CA","CA","CA","TX","TX","AZ","MN","CO")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)

我正在尝试使用多个具有不同数据帧输入情况的过滤器。

这是我正在处理的条件

如果整个数据帧的总行数 < 5，打印“没有足够的 ID”

示例：

id <- c(100,101,102,103)
state_code <- c("CA","CA","TX","CA")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)

期望输出

"Not enough ids"

如果总行数 >=5 且任何单个州的行数在 state_code >=5 中，则创建一列 Type = state_code 否则 Type = "combined"

示例：

id <- c(100,101,102,103,104,105,106,107,108,109,110,111,112,113,114)
state_code <- c("CA","CA","CA","CA","CA","CA","TX","TX","TX","TX","TX","TX","AZ","MN","CO")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)

期望输出

    id state_code     Type
   100         CA       CA
   101         CA       CA
   102         CA       CA
   103         CA       CA
   104         CA       CA
   105         CA       CA
   106         TX       TX
   107         TX       TX
   108         TX       TX
   109         TX       TX
   110         TX       TX
   111         TX       TX
   112         AZ Combined
   113         MN Combined
   114         CO Combined

如果总行数 >=5 且任何单个州的行数在 state_code are not >=5 中，则为所有值创建一列 Type = "combined"

示例：

id <- c(100,101,102,103,104,105,106,107,108,109,110)
state_code <- c("CA","CA","CA","CA","TX","TX","TX","TX","AZ","MN","CO")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)

期望输出

    id state_code     Type
   100         CA Combined
   101         CA Combined
   102         CA Combined
   103         CA Combined
   104         TX Combined
   105         TX Combined
   106         TX Combined
   107         TX Combined
   108         AZ Combined
   109         MN Combined
   110         CO Combined

我正尝试以这种方式处理第一种情况，但无法为其他情况这样做

if(nrow(df.sample < 5){
    cat("Not enough ids")
}

如何将所有这些逻辑封装到一个代码中？有人能给我指出正确的方向吗？

Answer 1

条件2和3相同，可以合并。试试这个功能。

library(dplyr)

foo <- function(data){
  if(nrow(data) < 5 ) {
    return("Not enough ids")
  } else {
    data %>%
      group_by(state_code) %>%
      mutate(Type = case_when(n() < 5 ~ 'Combined', 
                              TRUE ~state_code)) %>%
      ungroup
  }
}

Answer 2

这行得通吗：

library(dplyr)

rowscount <- function(df, id_col){
  if(nrow(df) < 5)
    return('Not enough ids')
  else{
    op_df = df %>% group_by({{id_col}}) %>% mutate(Type = if_else(n() >= 5, 'state_code', 'combined'))
  return(op_df)
  }
}
rowscount(df.sample, state_code)
# A tibble: 11 x 3
# Groups:   state_code [5]
      id state_code Type      
   <dbl> <chr>      <chr>     
 1   100 CA         state_code
 2   101 CA         state_code
 3   102 CA         state_code
 4   103 CA         state_code
 5   104 CA         state_code
 6   105 CA         state_code
 7   106 TX         combined  
 8   107 TX         combined  
 9   108 AZ         combined  
10   109 MN         combined  
11   110 CO         combined  

id <- c(100,101,102,103)
state_code <- c("CA","CA","TX","CA")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)

rowscount(df.sample, state_code)
[1] "Not enough ids"

根据 R 中不同数据框输入的多个条件创建列类型

Create a column Type based on several conditions with different dataframe inputs in R

r

dplyr

data.table

tidyverse