根据另一个数据框中的值在一个数据框中创建指示变量
Create an indicator variable in one data frame based on values in another data frame
比如说,我有一个名为 iris
的数据集。我想在此数据集中创建一个名为 sepal_length_group
的指标变量。该指标的值为 p25、p50、p75 和 p100。例如,如果物种是“setosa”并且如果 Sepal.Length
等于或小于所有分类为“的物种的第 25 个百分位数,我希望 sepal_length_group 等于“p25”濑户内”。我写了下面的代码,但是它生成了所有的 NAs:
library(skimr)
sepal_length_distribution <- iris %>% group_by(Species) %>% skim(Sepal.Length) %>% select(3, 9:12)
iris_2 <- iris %>% mutate(sepal_length_group = ifelse(Sepal.Length <= sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),2], "p25", NA))
iris_2 <- iris %>% mutate(sepal_length_group = ifelse(Sepal.Length > sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),2] &
Sepal.Length <= sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),3], "p50", NA))
iris_2 <- iris %>% mutate(sepal_length_group = ifelse(Sepal.Length > sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),3] &
Sepal.Length <= sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),4], "p75", NA))
iris_2 <- iris %>% mutate(sepal_length_group = ifelse(Sepal.Length > sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),4] &
Sepal.Length < sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),5], "p100", NA))
非常感谢任何帮助!
正如
评论的那样,这可以简单地通过使用函数 cut
来完成
library(tidyverse)
iris %>%
group_by(Species) %>%
mutate(cat = cut(Sepal.Length,
quantile(Sepal.Length, c(0,.25,.5,.75, 1)),
paste0('p', c(25,50, 75, 100)), include.lowest = TRUE))
比如说,我有一个名为 iris
的数据集。我想在此数据集中创建一个名为 sepal_length_group
的指标变量。该指标的值为 p25、p50、p75 和 p100。例如,如果物种是“setosa”并且如果 Sepal.Length
等于或小于所有分类为“的物种的第 25 个百分位数,我希望 sepal_length_group 等于“p25”濑户内”。我写了下面的代码,但是它生成了所有的 NAs:
library(skimr)
sepal_length_distribution <- iris %>% group_by(Species) %>% skim(Sepal.Length) %>% select(3, 9:12)
iris_2 <- iris %>% mutate(sepal_length_group = ifelse(Sepal.Length <= sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),2], "p25", NA))
iris_2 <- iris %>% mutate(sepal_length_group = ifelse(Sepal.Length > sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),2] &
Sepal.Length <= sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),3], "p50", NA))
iris_2 <- iris %>% mutate(sepal_length_group = ifelse(Sepal.Length > sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),3] &
Sepal.Length <= sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),4], "p75", NA))
iris_2 <- iris %>% mutate(sepal_length_group = ifelse(Sepal.Length > sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),4] &
Sepal.Length < sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),5], "p100", NA))
非常感谢任何帮助!
正如
cut
来完成
library(tidyverse)
iris %>%
group_by(Species) %>%
mutate(cat = cut(Sepal.Length,
quantile(Sepal.Length, c(0,.25,.5,.75, 1)),
paste0('p', c(25,50, 75, 100)), include.lowest = TRUE))