避免在 R 中嵌套 ifelse 语句的方法

Question

我有一个数据框，其中包含各种年龄和每个年龄的值。我想将不同的年龄划分为更广泛的年龄组。为此，我必须创建一个相当复杂的嵌套 ifelse 语句：

library(tidyverse)

df <- data.frame(age = c("15 to 17", "18 and 19", "20 to 24", "25 to 29", "30 to 34", "35 to 39", "40 to 44", "45 to 49", "50 to 54", "55 to 59"),
                 value = sample(1000:2000,10, replace=TRUE))

new_df = df %>% 
  mutate(age_band = 
           ifelse(age %in% c("15 to 17","18 and 19"), '15 to 19', ifelse(age %in% c("20 to 24","25 to 29"), '20s', ifelse(age %in% c("30 to 34","35 to 39"), '30s','40+'))))

有什么方法可以不使用如此复杂的嵌套语句来做到这一点吗？数据一直到 85 岁及以上，对每个不同年龄组进行分类变得非常复杂。

Answer 1

遵循@Claudiu Papasteri 的建议：

套餐

library(dplyr)

解决方案

df %>% 
  mutate(age_band = case_when( age %in% c("15 to 17","18 and 19") ~  '15 to 19',
                               age %in% c("20 to 24","25 to 29") ~ '20s',
                               age %in% c("30 to 34","35 to 39") ~ '30s',
                               TRUE ~'40+')
         )

ps：我会在 case_when 中包括所有特定的陈述，甚至包括“40+”的最后一个陈述，这样您就可以跟踪数据集中的任何问题。因为如果缺少某些东西，或者你有错字，其他所有东西都会被编码为“40+”。因此，通过陈述每个案例，您可以在运行分析或生成图表之前发现问题并解决所有问题。最后一个语句可以更改为 TRUE ~ age。这意味着，无论剩下什么，都使用相同的值，或 TRUE ~ NA，您重新编码剩余的内容以查找丢失的内容。所以你知道所有缺失都意味着你有问题你必须解决。

输出

         age value age_band
1   15 to 17  1432 15 to 19
2  18 and 19  1112 15 to 19
3   20 to 24  1265      20s
4   25 to 29  1076      20s
5   30 to 34  1212      30s
6   35 to 39  1238      30s
7   40 to 44  1384      40+
8   45 to 49  1612      40+
9   50 to 54  1606      40+
10  55 to 59  1897      40+

Answer 2

你可以做的是使用前两个字符来获取当前组的第一个年龄编号，然后使用 cut 和 re-define 你的休息和标签。

代码

df %>% 
  mutate(age_band = cut(
    as.numeric(substr(age, 1, 2)), 
    breaks = c(15, 20, 30, 40, 100), 
    labels = c("15 to 19", "20s", "30s", "40+"), 
    right = F)
  )

输出

         age value  age_band
1   15 to 17  1216  15 to 19
2  18 and 19  1983  15 to 19
3   20 to 24  1839       20s
4   25 to 29  1558       20s
5   30 to 34  1741       30s
6   35 to 39  1171       30s
7   40 to 44  1324       40+
8   45 to 49  1354       40+
9   50 to 54  1342       40+
10  55 to 59  1467       40+

避免在 R 中嵌套 ifelse 语句的方法

Way to avoid nested ifelse statement in R

r

tidyverse

套餐

解决方案

输出