如何按组查找 n 个最大值的平均值并将其添加为 r 数据中的新列 table
How to find the average of n largest values by group and add as a new column in r data table
在问一个新问题之前,我阅读了很多类似的问题,但我在这里。我有一个很长的数据 table,其中包含绘图、dbh 等。我的数据示例如下:
structure(list(plot = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), dbh = c(18L, 14L,
13L, 20L, 20L, 15L, 9L, 12L, 22L, 21L, 14L, 14L, 13L, 18L, 24L,
19L, 13L, 15L, 17L, 22L, 11L)), class = "data.frame", row.names = c(NA,
-21L))
我想要做的是按组(绘图)找到 5 个最大值的平均值,并将此值作为新列添加到相同的数据 table。我期待得到以下结果。
structure(list(plot = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), dbh = c(18L, 14L,
13L, 20L, 20L, 15L, 9L, 12L, 22L, 21L, 14L, 14L, 13L, 18L, 24L,
19L, 13L, 15L, 17L, 22L, 11L), dom = c(17.4, 17.4, 17.4, 17.4,
17.4, 17.4, 17.4, 17.4, 21.6, 21.6, 21.6, 21.6, 21.6, 21.6, 21.6,
21.6, 21.6, 21.6, 21.6, 21.6, 21.6)), class = "data.frame", row.names = c(NA,
-21L))
非常感谢你的帮助。谢谢
PS:我在不同的包(data.table、dplyr 等)中尝试了许多不同的代码,但是无法做到,所以我不会给出任何不提供的 mwe没用。
您可以使用 head
/tail
获取前 5 个值:
df$dom <- with(df, ave(dbh, plot, FUN = function(x) mean(tail(sort(x), 5))))
#same as doing 1:5
#df$dom <- with(df, ave(dbh, plot, FUN = function(x)
mean(sort(x, decreasing = TRUE)[1:5])))
或使用dplyr
:
library(dplyr)
df %>% group_by(plot) %>% mutate(dom = mean(tail(sort(dbh), 5)))
和data.table
:
library(data.table)
setDT(df)[, dom := mean(tail(sort(dbh), 5)), plot]
df
# plot dbh dom
# 1: 1 18 17.4
# 2: 1 14 17.4
# 3: 1 13 17.4
# 4: 1 20 17.4
# 5: 1 20 17.4
# 6: 1 15 17.4
# 7: 1 9 17.4
# 8: 1 12 17.4
# 9: 2 22 21.6
#10: 2 21 21.6
#11: 2 14 21.6
#12: 2 14 21.6
#13: 2 13 21.6
#14: 2 18 21.6
#15: 2 24 21.6
#16: 2 19 21.6
#17: 2 13 21.6
#18: 2 15 21.6
#19: 2 17 21.6
#20: 2 22 21.6
#21: 2 11 21.6
# plot dbh dom
dplyr
也有 slice_max
函数(以前是 top_n
)来获取每个组中的最高 n
值。
df %>%
group_by(plot) %>%
slice_max(dbh, n = 5) %>%
summarise(dom = mean(dbh)) %>%
left_join(df, by = 'plot')
在问一个新问题之前,我阅读了很多类似的问题,但我在这里。我有一个很长的数据 table,其中包含绘图、dbh 等。我的数据示例如下:
structure(list(plot = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), dbh = c(18L, 14L,
13L, 20L, 20L, 15L, 9L, 12L, 22L, 21L, 14L, 14L, 13L, 18L, 24L,
19L, 13L, 15L, 17L, 22L, 11L)), class = "data.frame", row.names = c(NA,
-21L))
我想要做的是按组(绘图)找到 5 个最大值的平均值,并将此值作为新列添加到相同的数据 table。我期待得到以下结果。
structure(list(plot = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), dbh = c(18L, 14L,
13L, 20L, 20L, 15L, 9L, 12L, 22L, 21L, 14L, 14L, 13L, 18L, 24L,
19L, 13L, 15L, 17L, 22L, 11L), dom = c(17.4, 17.4, 17.4, 17.4,
17.4, 17.4, 17.4, 17.4, 21.6, 21.6, 21.6, 21.6, 21.6, 21.6, 21.6,
21.6, 21.6, 21.6, 21.6, 21.6, 21.6)), class = "data.frame", row.names = c(NA,
-21L))
非常感谢你的帮助。谢谢
PS:我在不同的包(data.table、dplyr 等)中尝试了许多不同的代码,但是无法做到,所以我不会给出任何不提供的 mwe没用。
您可以使用 head
/tail
获取前 5 个值:
df$dom <- with(df, ave(dbh, plot, FUN = function(x) mean(tail(sort(x), 5))))
#same as doing 1:5
#df$dom <- with(df, ave(dbh, plot, FUN = function(x)
mean(sort(x, decreasing = TRUE)[1:5])))
或使用dplyr
:
library(dplyr)
df %>% group_by(plot) %>% mutate(dom = mean(tail(sort(dbh), 5)))
和data.table
:
library(data.table)
setDT(df)[, dom := mean(tail(sort(dbh), 5)), plot]
df
# plot dbh dom
# 1: 1 18 17.4
# 2: 1 14 17.4
# 3: 1 13 17.4
# 4: 1 20 17.4
# 5: 1 20 17.4
# 6: 1 15 17.4
# 7: 1 9 17.4
# 8: 1 12 17.4
# 9: 2 22 21.6
#10: 2 21 21.6
#11: 2 14 21.6
#12: 2 14 21.6
#13: 2 13 21.6
#14: 2 18 21.6
#15: 2 24 21.6
#16: 2 19 21.6
#17: 2 13 21.6
#18: 2 15 21.6
#19: 2 17 21.6
#20: 2 22 21.6
#21: 2 11 21.6
# plot dbh dom
dplyr
也有 slice_max
函数(以前是 top_n
)来获取每个组中的最高 n
值。
df %>%
group_by(plot) %>%
slice_max(dbh, n = 5) %>%
summarise(dom = mean(dbh)) %>%
left_join(df, by = 'plot')