Select 行(按最近一年)
Select rows by most recent year
我有一个多年以来在 x 个单位和 y 个采样站(每个单元内的多个站点)收集的植被指标数据框。我想 select 收集数据的最近一年每个单元的所有植被数据。这是我的数据框的示例:
veg <- c("tree","grass","tree","grass","tree","grass","tree","grass")
cover <- c(0.97,0.21,0.35,0.67,0.45,0.72,0.27,0.67)
unit <- c("U1","U1","U1","U1","U2","U2","U2","U2")
station <- c("A1","A1","A2","A2","A3","A3","A4","A4")
year <- c(2015,2015,2014,2014,2013,2013,2014,2014)
df <- data.frame(veg,cover,unit,station,year)
数据框如下所示:
veg cover unit station year
1 tree 0.97 U1 A1 2015
2 grass 0.21 U1 A1 2015
3 tree 0.35 U1 A2 2014
4 grass 0.67 U1 A2 2014
5 tree 0.45 U2 A3 2013
6 grass 0.72 U2 A3 2013
7 tree 0.27 U2 A4 2014
8 grass 0.67 U2 A4 2014
我希望它看起来像这样:
veg cover unit station year
1 tree 0.97 U1 A1 2015
2 grass 0.21 U1 A1 2015
3 tree 0.27 U2 A4 2014
4 grass 0.67 U2 A4 2014
如有任何帮助,我们将不胜感激。
这就是您的答案,您想要 veg/unit 的最新答案吗?
library(dplyr)
df %>%
group_by(veg, unit) %>%
arrange(desc(year)) %>%
slice(1)
就是没有任何包的情况下怎么做。
df.by = by(df, df$unit, FUN = function(t) t[t$year == max(t$year),])
df.recent = Reduce(function(...) merge(..., all=T), df.by)
df.recent
输出为
> df.recent
veg cover unit station year
1 grass 0.21 U1 A1 2015
2 grass 0.67 U2 A4 2014
3 tree 0.27 U2 A4 2014
4 tree 0.97 U1 A1 2015
对于第一行,我们使用函数 by
通过因子 df$unit
对数据框进行子集化。对于每个子集(对于每个单元),我们通过匿名函数 function(t) t[t$year == max(t$year),])
.
提取最近一年的行
df.by 是一个数据框列表,其中仅包含每个单元最近一年的行。
对于第二行,我们使用merge
函数合并df.by
中的所有数据框。 Simultaneously merge multiple data.frames in a list .
中解释了此代码的使用
我有一个多年以来在 x 个单位和 y 个采样站(每个单元内的多个站点)收集的植被指标数据框。我想 select 收集数据的最近一年每个单元的所有植被数据。这是我的数据框的示例:
veg <- c("tree","grass","tree","grass","tree","grass","tree","grass")
cover <- c(0.97,0.21,0.35,0.67,0.45,0.72,0.27,0.67)
unit <- c("U1","U1","U1","U1","U2","U2","U2","U2")
station <- c("A1","A1","A2","A2","A3","A3","A4","A4")
year <- c(2015,2015,2014,2014,2013,2013,2014,2014)
df <- data.frame(veg,cover,unit,station,year)
数据框如下所示:
veg cover unit station year
1 tree 0.97 U1 A1 2015
2 grass 0.21 U1 A1 2015
3 tree 0.35 U1 A2 2014
4 grass 0.67 U1 A2 2014
5 tree 0.45 U2 A3 2013
6 grass 0.72 U2 A3 2013
7 tree 0.27 U2 A4 2014
8 grass 0.67 U2 A4 2014
我希望它看起来像这样:
veg cover unit station year
1 tree 0.97 U1 A1 2015
2 grass 0.21 U1 A1 2015
3 tree 0.27 U2 A4 2014
4 grass 0.67 U2 A4 2014
如有任何帮助,我们将不胜感激。
这就是您的答案,您想要 veg/unit 的最新答案吗?
library(dplyr)
df %>%
group_by(veg, unit) %>%
arrange(desc(year)) %>%
slice(1)
就是没有任何包的情况下怎么做。
df.by = by(df, df$unit, FUN = function(t) t[t$year == max(t$year),])
df.recent = Reduce(function(...) merge(..., all=T), df.by)
df.recent
输出为
> df.recent
veg cover unit station year
1 grass 0.21 U1 A1 2015
2 grass 0.67 U2 A4 2014
3 tree 0.27 U2 A4 2014
4 tree 0.97 U1 A1 2015
对于第一行,我们使用函数 by
通过因子 df$unit
对数据框进行子集化。对于每个子集(对于每个单元),我们通过匿名函数 function(t) t[t$year == max(t$year),])
.
df.by 是一个数据框列表,其中仅包含每个单元最近一年的行。
对于第二行,我们使用merge
函数合并df.by
中的所有数据框。 Simultaneously merge multiple data.frames in a list .