使用另一列的四分位数值在数据框中创建变量
Create variable in data frame using another column's Quartile value
我想在数据框中创建一个变量,它会根据列的 Quartile/Median 值对观察结果进行分类。
下面是我试过的。
Name<-c("name1","name2","name3","name4","name5","name6")
Age<-c(49,12,29,55,25,19)
df9<-data.frame(Name,Age)
df9$catoG[df9$Age<=quantile(df9$Age,0.25)]<-"Young"
df9$catoG[df9$Age>quantile(df9$Age,0.25) & df9$Age<=median(df9$Age)]<-"Adult"
df9$catoG[df9$Age>median(df9$Age)]<-"Elder"
我收到的输出是
Name Age catoG
1 name1 49 Elder
2 name2 12 Young
3 name3 29 Elder
4 name4 55 Elder
5 name5 25 Adult
6 name6 19 Young
在 R 中是否有更有效的方法可以实现相同的目标?
您可以使用 dplyr
包中的 dplyr::mutate
and dplyr::case_when
:
Name<-c("name1","name2","name3","name4","name5","name6")
Age<-c(49,12,29,55,25,19)
df9<-data.frame(Name,Age)
df9 %>% mutate(catoG = case_when(Age<=quantile(Age,0.25) ~ 'Young',
Age>quantile(Age,0.25) & Age<=median(Age) ~ 'Adult',
TRUE ~ 'Elder'))
cut
是所有涉及在范围内分割向量的任务的朋友:
df9$new = cut(df9$Age,
breaks = c(-Inf, quantile(df9$Age,c(0.25, 0.5)), Inf),
labels = c('Young', 'Adult', 'Elder') )
# Name Age catoG new
#1 name1 49 Elder Elder
#2 name2 12 Young Young
#3 name3 29 Elder Elder
#4 name4 55 Elder Elder
#5 name5 25 Adult Adult
#6 name6 19 Young Young
以下函数从数值向量创建分位数 (n) 组向量(因此四分位数为 n = 4):
qgroup = function(numvec, n = 4){
qtile = quantile(numvec, probs = seq(0, 1, 1/n))
out = sapply(numvec, function(x) sum(x >= qtile[-(n+1)]))
return(out)
}
正在对您的数据应用函数:
Name = c("name1","name2","name3","name4","name5","name6")
Age = c(49,12,29,55,25,19)
df9 = data.table(Name,Age)
df9[, Q := qgroup(Age)]
> df9
Name Age Q
1: name1 49 4
2: name2 12 1
3: name3 29 3
4: name4 55 4
5: name5 25 2
6: name6 19 1
最后,我们标记四分位数组:
labels = list('Young', 'Adult', 'Elder', 'Elder')
df9[, Label := sapply(Q, function(x) labels[x])]
> df9
Name Age Q Label
1: name1 49 4 Elder
2: name2 12 1 Young
3: name3 29 3 Elder
4: name4 55 4 Elder
5: name5 25 2 Adult
6: name6 19 1 Young
我想在数据框中创建一个变量,它会根据列的 Quartile/Median 值对观察结果进行分类。
下面是我试过的。
Name<-c("name1","name2","name3","name4","name5","name6")
Age<-c(49,12,29,55,25,19)
df9<-data.frame(Name,Age)
df9$catoG[df9$Age<=quantile(df9$Age,0.25)]<-"Young"
df9$catoG[df9$Age>quantile(df9$Age,0.25) & df9$Age<=median(df9$Age)]<-"Adult"
df9$catoG[df9$Age>median(df9$Age)]<-"Elder"
我收到的输出是
Name Age catoG
1 name1 49 Elder
2 name2 12 Young
3 name3 29 Elder
4 name4 55 Elder
5 name5 25 Adult
6 name6 19 Young
在 R 中是否有更有效的方法可以实现相同的目标?
您可以使用 dplyr
包中的 dplyr::mutate
and dplyr::case_when
:
Name<-c("name1","name2","name3","name4","name5","name6")
Age<-c(49,12,29,55,25,19)
df9<-data.frame(Name,Age)
df9 %>% mutate(catoG = case_when(Age<=quantile(Age,0.25) ~ 'Young',
Age>quantile(Age,0.25) & Age<=median(Age) ~ 'Adult',
TRUE ~ 'Elder'))
cut
是所有涉及在范围内分割向量的任务的朋友:
df9$new = cut(df9$Age,
breaks = c(-Inf, quantile(df9$Age,c(0.25, 0.5)), Inf),
labels = c('Young', 'Adult', 'Elder') )
# Name Age catoG new
#1 name1 49 Elder Elder
#2 name2 12 Young Young
#3 name3 29 Elder Elder
#4 name4 55 Elder Elder
#5 name5 25 Adult Adult
#6 name6 19 Young Young
以下函数从数值向量创建分位数 (n) 组向量(因此四分位数为 n = 4):
qgroup = function(numvec, n = 4){
qtile = quantile(numvec, probs = seq(0, 1, 1/n))
out = sapply(numvec, function(x) sum(x >= qtile[-(n+1)]))
return(out)
}
正在对您的数据应用函数:
Name = c("name1","name2","name3","name4","name5","name6")
Age = c(49,12,29,55,25,19)
df9 = data.table(Name,Age)
df9[, Q := qgroup(Age)]
> df9
Name Age Q
1: name1 49 4
2: name2 12 1
3: name3 29 3
4: name4 55 4
5: name5 25 2
6: name6 19 1
最后,我们标记四分位数组:
labels = list('Young', 'Adult', 'Elder', 'Elder')
df9[, Label := sapply(Q, function(x) labels[x])]
> df9
Name Age Q Label
1: name1 49 4 Elder
2: name2 12 1 Young
3: name3 29 3 Elder
4: name4 55 4 Elder
5: name5 25 2 Adult
6: name6 19 1 Young