根据 R 中不同数据框输入的多个条件创建列类型
Create a column Type based on several conditions with different dataframe inputs in R
我有一个这样的数据框
id <- c(100,101,102,103,104,105,106,107,108,109,110)
state_code <- c("CA","CA","CA","CA","CA","CA","TX","TX","AZ","MN","CO")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)
我正在尝试使用多个具有不同数据帧输入情况的过滤器。
这是我正在处理的条件
- 如果整个数据帧的总行数 < 5,打印“没有足够的 ID”
示例:
id <- c(100,101,102,103)
state_code <- c("CA","CA","TX","CA")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)
期望输出
"Not enough ids"
- 如果总行数 >=5 且任何单个州的行数在
state_code >=5
中,则创建一列 Type = state_code
否则 Type = "combined"
示例:
id <- c(100,101,102,103,104,105,106,107,108,109,110,111,112,113,114)
state_code <- c("CA","CA","CA","CA","CA","CA","TX","TX","TX","TX","TX","TX","AZ","MN","CO")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)
期望输出
id state_code Type
100 CA CA
101 CA CA
102 CA CA
103 CA CA
104 CA CA
105 CA CA
106 TX TX
107 TX TX
108 TX TX
109 TX TX
110 TX TX
111 TX TX
112 AZ Combined
113 MN Combined
114 CO Combined
- 如果总行数 >=5 且任何单个州的行数在
state_code are not >=5
中,则为所有值创建一列 Type = "combined"
示例:
id <- c(100,101,102,103,104,105,106,107,108,109,110)
state_code <- c("CA","CA","CA","CA","TX","TX","TX","TX","AZ","MN","CO")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)
期望输出
id state_code Type
100 CA Combined
101 CA Combined
102 CA Combined
103 CA Combined
104 TX Combined
105 TX Combined
106 TX Combined
107 TX Combined
108 AZ Combined
109 MN Combined
110 CO Combined
我正尝试以这种方式处理第一种情况,但无法为其他情况这样做
if(nrow(df.sample < 5){
cat("Not enough ids")
}
如何将所有这些逻辑封装到一个代码中?有人能给我指出正确的方向吗?
条件2和3相同,可以合并。试试这个功能。
library(dplyr)
foo <- function(data){
if(nrow(data) < 5 ) {
return("Not enough ids")
} else {
data %>%
group_by(state_code) %>%
mutate(Type = case_when(n() < 5 ~ 'Combined',
TRUE ~state_code)) %>%
ungroup
}
}
这行得通吗:
library(dplyr)
rowscount <- function(df, id_col){
if(nrow(df) < 5)
return('Not enough ids')
else{
op_df = df %>% group_by({{id_col}}) %>% mutate(Type = if_else(n() >= 5, 'state_code', 'combined'))
return(op_df)
}
}
rowscount(df.sample, state_code)
# A tibble: 11 x 3
# Groups: state_code [5]
id state_code Type
<dbl> <chr> <chr>
1 100 CA state_code
2 101 CA state_code
3 102 CA state_code
4 103 CA state_code
5 104 CA state_code
6 105 CA state_code
7 106 TX combined
8 107 TX combined
9 108 AZ combined
10 109 MN combined
11 110 CO combined
id <- c(100,101,102,103)
state_code <- c("CA","CA","TX","CA")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)
rowscount(df.sample, state_code)
[1] "Not enough ids"
我有一个这样的数据框
id <- c(100,101,102,103,104,105,106,107,108,109,110)
state_code <- c("CA","CA","CA","CA","CA","CA","TX","TX","AZ","MN","CO")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)
我正在尝试使用多个具有不同数据帧输入情况的过滤器。
这是我正在处理的条件
- 如果整个数据帧的总行数 < 5,打印“没有足够的 ID”
示例:
id <- c(100,101,102,103)
state_code <- c("CA","CA","TX","CA")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)
期望输出
"Not enough ids"
- 如果总行数 >=5 且任何单个州的行数在
state_code >=5
中,则创建一列Type = state_code
否则Type = "combined"
示例:
id <- c(100,101,102,103,104,105,106,107,108,109,110,111,112,113,114)
state_code <- c("CA","CA","CA","CA","CA","CA","TX","TX","TX","TX","TX","TX","AZ","MN","CO")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)
期望输出
id state_code Type
100 CA CA
101 CA CA
102 CA CA
103 CA CA
104 CA CA
105 CA CA
106 TX TX
107 TX TX
108 TX TX
109 TX TX
110 TX TX
111 TX TX
112 AZ Combined
113 MN Combined
114 CO Combined
- 如果总行数 >=5 且任何单个州的行数在
state_code are not >=5
中,则为所有值创建一列Type = "combined"
示例:
id <- c(100,101,102,103,104,105,106,107,108,109,110)
state_code <- c("CA","CA","CA","CA","TX","TX","TX","TX","AZ","MN","CO")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)
期望输出
id state_code Type
100 CA Combined
101 CA Combined
102 CA Combined
103 CA Combined
104 TX Combined
105 TX Combined
106 TX Combined
107 TX Combined
108 AZ Combined
109 MN Combined
110 CO Combined
我正尝试以这种方式处理第一种情况,但无法为其他情况这样做
if(nrow(df.sample < 5){
cat("Not enough ids")
}
如何将所有这些逻辑封装到一个代码中?有人能给我指出正确的方向吗?
条件2和3相同,可以合并。试试这个功能。
library(dplyr)
foo <- function(data){
if(nrow(data) < 5 ) {
return("Not enough ids")
} else {
data %>%
group_by(state_code) %>%
mutate(Type = case_when(n() < 5 ~ 'Combined',
TRUE ~state_code)) %>%
ungroup
}
}
这行得通吗:
library(dplyr)
rowscount <- function(df, id_col){
if(nrow(df) < 5)
return('Not enough ids')
else{
op_df = df %>% group_by({{id_col}}) %>% mutate(Type = if_else(n() >= 5, 'state_code', 'combined'))
return(op_df)
}
}
rowscount(df.sample, state_code)
# A tibble: 11 x 3
# Groups: state_code [5]
id state_code Type
<dbl> <chr> <chr>
1 100 CA state_code
2 101 CA state_code
3 102 CA state_code
4 103 CA state_code
5 104 CA state_code
6 105 CA state_code
7 106 TX combined
8 107 TX combined
9 108 AZ combined
10 109 MN combined
11 110 CO combined
id <- c(100,101,102,103)
state_code <- c("CA","CA","TX","CA")
df.sample <- data.frame(id,state_code,stringsAsFactors=FALSE)
rowscount(df.sample, state_code)
[1] "Not enough ids"