如何遍历具有多个值的数据帧以使用最大值来识别值
How to loop through a dataframe with multiple values to use the max to identify the value
我正在尝试查找一个组的最大值以及该最大值的相关行信息,其中重复了不同的组。
例如,数据集如下所示:
newid sex visitnum sbpval
36 13580 M 2 NA
37 13580 M 3 124
38 13580 M 4 116
39 21525 F 2 410
40 21525 F 3 116
我希望输出如下所示:
newid sex visitnum sbpval
1 13580 M 3 124
2 21525 F 2 410
我正在尝试创建一个循环遍历的循环,但很难弄清楚如何对它们进行分组。
这是我目前的代码:
> for (i in 1:length(df)){
+ maxsbp = max(i, na.rm = F)
+ max = cbind(maxsbp,max)
+ result = cbind(result, df[[i]])
+ }
> result
这就是它给我的:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[36,] 32 2 NA 32 2 NA "13580" "M" 2 NA
[37,] 32 3 124 32 3 124 "13580" "M" 3 124
[38,] 32 4 116 32 4 116 "13580" "M" 4 116
[39,] 33 2 410 33 2 410 "21525" "F" 2 410
[40,] 33 3 116 33 3 116 "21525" "F" 3 116
这是你想要的吗
subset(
df,
ave(sbpval, newid, FUN = function(x) max(x, na.rm = TRUE)) == sbpval
)
这给出了
newid sex visitnum sbpval
37 13580 M 3 124
39 21525 F 2 410
数据
> dput(df)
structure(list(newid = c(13580L, 13580L, 13580L, 21525L, 21525L
), sex = c("M", "M", "M", "F", "F"), visitnum = c(2L, 3L, 4L,
2L, 3L), sbpval = c(NA, 124L, 116L, 410L, 116L)), class = "data.frame", row.names = c("36",
"37", "38", "39", "40"))
另一个选项slice_max
library(dplyr)
df %>%
group_by(newid) %>%
slice_max(sbpval) %>%
ungroup
基础 R 解决方案
df <- structure(list(newid = c(13580L, 13580L, 13580L, 21525L, 21525L
), sex = c("M", "M", "M", "F", "F"), visitnum = c(2L, 3L, 4L,
2L, 3L), sbpval = c(NA, 124L, 116L, 410L, 116L)), class = "data.frame", row.names = c("36",
"37", "38", "39", "40"))
df <- merge(aggregate(sbpval ~ sex, max, data = df), df)
# Reorder the columns
df[, c(3, 1, 4, 2)]
#> newid sex visitnum sbpval
#> 1 21525 F 2 410
#> 2 13580 M 3 124
由 reprex package (v0.3.0)
于 2021 年 3 月 12 日创建
我正在尝试查找一个组的最大值以及该最大值的相关行信息,其中重复了不同的组。
例如,数据集如下所示:
newid sex visitnum sbpval
36 13580 M 2 NA
37 13580 M 3 124
38 13580 M 4 116
39 21525 F 2 410
40 21525 F 3 116
我希望输出如下所示:
newid sex visitnum sbpval
1 13580 M 3 124
2 21525 F 2 410
我正在尝试创建一个循环遍历的循环,但很难弄清楚如何对它们进行分组。
这是我目前的代码:
> for (i in 1:length(df)){
+ maxsbp = max(i, na.rm = F)
+ max = cbind(maxsbp,max)
+ result = cbind(result, df[[i]])
+ }
> result
这就是它给我的:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[36,] 32 2 NA 32 2 NA "13580" "M" 2 NA
[37,] 32 3 124 32 3 124 "13580" "M" 3 124
[38,] 32 4 116 32 4 116 "13580" "M" 4 116
[39,] 33 2 410 33 2 410 "21525" "F" 2 410
[40,] 33 3 116 33 3 116 "21525" "F" 3 116
这是你想要的吗
subset(
df,
ave(sbpval, newid, FUN = function(x) max(x, na.rm = TRUE)) == sbpval
)
这给出了
newid sex visitnum sbpval
37 13580 M 3 124
39 21525 F 2 410
数据
> dput(df)
structure(list(newid = c(13580L, 13580L, 13580L, 21525L, 21525L
), sex = c("M", "M", "M", "F", "F"), visitnum = c(2L, 3L, 4L,
2L, 3L), sbpval = c(NA, 124L, 116L, 410L, 116L)), class = "data.frame", row.names = c("36",
"37", "38", "39", "40"))
另一个选项slice_max
library(dplyr)
df %>%
group_by(newid) %>%
slice_max(sbpval) %>%
ungroup
基础 R 解决方案
df <- structure(list(newid = c(13580L, 13580L, 13580L, 21525L, 21525L
), sex = c("M", "M", "M", "F", "F"), visitnum = c(2L, 3L, 4L,
2L, 3L), sbpval = c(NA, 124L, 116L, 410L, 116L)), class = "data.frame", row.names = c("36",
"37", "38", "39", "40"))
df <- merge(aggregate(sbpval ~ sex, max, data = df), df)
# Reorder the columns
df[, c(3, 1, 4, 2)]
#> newid sex visitnum sbpval
#> 1 21525 F 2 410
#> 2 13580 M 3 124
由 reprex package (v0.3.0)
于 2021 年 3 月 12 日创建