匹配R中不同数据框中的站点
matching sites in different dataframes in R
我有不止一个这样的数据框:
列中有许多物种,我没有在这里报告。
d1:
Year Region Sites Depth Transect Pharia pyramidatus
2000 LP BALLENA 5 1 0.03
2000 LP ISLOTES 5 1 0.20
2000 LP NORTE 5 1 0.10
2000 LP NORTE 20 1 0.00
d2
Year Region Sites Depth Transect Pharia pyramidatus
2010 LP PLAYA 5 1 0.03
2010 LP ISLOTES 5 1 0.20
2010 LP NORTE 5 1 0.10
2010 LP NORTE 20 1 0.00
d3
Year Region Sites Depth Transect Pharia pyramidatus
2016 LP BALLENA 5 1 0.03
2016 LP ISLOTES 5 1 0.20
2016 LP SUR 5 1 0.10
2016 LP NORTE 20 1 0.00
我想做的是提取仅在 每年 年出现的相同站点 (Reef
) 并将结果绑定到一个数据框中应如下所示:
Year Region Reef Depth Transect Pharia pyramidatus
2000 LP ISLOTES 5 1 0.20
2000 LP NORTE 5 1 0.10
2000 LP NORTE 20 1 0.00
2010 LP ISLOTES 5 1 0.20
2010 LP NORTE 5 1 0.10
2010 LP NORTE 20 1 0.00
2016 LP ISLOTES 5 1 0.20
2016 LP NORTE 20 1 0.00
非常感谢您的帮助
dplyr
的解决方案:
library(dplyr)
rbind(df1, df2, df3) %>%
group_by(Reef) %>%
filter(n_distinct(Year) == 3)
结果:
# A tibble: 8 x 6
# Groups: Reef [2]
Year Region Reef Depth Transect Pharia_pyramidatus
<int> <fctr> <fctr> <int> <int> <dbl>
1 2000 LP ISLOTES 5 1 0.2
2 2000 LP NORTE 5 1 0.1
3 2000 LP NORTE 20 1 0.0
4 2010 LP ISLOTES 5 1 0.2
5 2010 LP NORTE 5 1 0.1
6 2010 LP NORTE 20 1 0.0
7 2016 LP ISLOTES 5 1 0.2
8 2016 LP NORTE 20 1 0.0
备注:
n_distinct
计算每个 Reef
的不同 Year
的数量(因为我 group_by(Reef)
)。我想要 distinct_n == 3
因为我只想要 return 行,其中 Reef
有每个 Year
的记录,在本例中为 3 年。在更一般的情况下,如果有更多 Year
,您可能希望首先找到数据帧的 Year
跨度,然后基于此找到 filter
,如下所示:
rbind(df1, df2, df3) %>%
mutate(Year_distinct = n_distinct(Year)) %>%
group_by(Reef) %>%
filter(n_distinct(Year) == Year_distinct) %>%
select(-Year_distinct)
数据:
df1 = read.table(text = "Year Region Reef Depth Transect Pharia_pyramidatus
2000 LP BALLENA 5 1 0.03
2000 LP ISLOTES 5 1 0.20
2000 LP NORTE 5 1 0.10
2000 LP NORTE 20 1 0.00", header = TRUE)
df2 = read.table(text = "Year Region Reef Depth Transect Pharia_pyramidatus
2010 LP PLAYA 5 1 0.03
2010 LP ISLOTES 5 1 0.20
2010 LP NORTE 5 1 0.10
2010 LP NORTE 20 1 0.00", header = TRUE)
df3 = read.table(text = "Year Region Reef Depth Transect Pharia_pyramidatus
2016 LP BALLENA 5 1 0.03
2016 LP ISLOTES 5 1 0.20
2016 LP SUR 5 1 0.10
2016 LP NORTE 20 1 0.00", header = TRUE)
我有不止一个这样的数据框: 列中有许多物种,我没有在这里报告。 d1:
Year Region Sites Depth Transect Pharia pyramidatus
2000 LP BALLENA 5 1 0.03
2000 LP ISLOTES 5 1 0.20
2000 LP NORTE 5 1 0.10
2000 LP NORTE 20 1 0.00
d2
Year Region Sites Depth Transect Pharia pyramidatus
2010 LP PLAYA 5 1 0.03
2010 LP ISLOTES 5 1 0.20
2010 LP NORTE 5 1 0.10
2010 LP NORTE 20 1 0.00
d3
Year Region Sites Depth Transect Pharia pyramidatus
2016 LP BALLENA 5 1 0.03
2016 LP ISLOTES 5 1 0.20
2016 LP SUR 5 1 0.10
2016 LP NORTE 20 1 0.00
我想做的是提取仅在 每年 年出现的相同站点 (Reef
) 并将结果绑定到一个数据框中应如下所示:
Year Region Reef Depth Transect Pharia pyramidatus
2000 LP ISLOTES 5 1 0.20
2000 LP NORTE 5 1 0.10
2000 LP NORTE 20 1 0.00
2010 LP ISLOTES 5 1 0.20
2010 LP NORTE 5 1 0.10
2010 LP NORTE 20 1 0.00
2016 LP ISLOTES 5 1 0.20
2016 LP NORTE 20 1 0.00
非常感谢您的帮助
dplyr
的解决方案:
library(dplyr)
rbind(df1, df2, df3) %>%
group_by(Reef) %>%
filter(n_distinct(Year) == 3)
结果:
# A tibble: 8 x 6
# Groups: Reef [2]
Year Region Reef Depth Transect Pharia_pyramidatus
<int> <fctr> <fctr> <int> <int> <dbl>
1 2000 LP ISLOTES 5 1 0.2
2 2000 LP NORTE 5 1 0.1
3 2000 LP NORTE 20 1 0.0
4 2010 LP ISLOTES 5 1 0.2
5 2010 LP NORTE 5 1 0.1
6 2010 LP NORTE 20 1 0.0
7 2016 LP ISLOTES 5 1 0.2
8 2016 LP NORTE 20 1 0.0
备注:
n_distinct
计算每个 Reef
的不同 Year
的数量(因为我 group_by(Reef)
)。我想要 distinct_n == 3
因为我只想要 return 行,其中 Reef
有每个 Year
的记录,在本例中为 3 年。在更一般的情况下,如果有更多 Year
,您可能希望首先找到数据帧的 Year
跨度,然后基于此找到 filter
,如下所示:
rbind(df1, df2, df3) %>%
mutate(Year_distinct = n_distinct(Year)) %>%
group_by(Reef) %>%
filter(n_distinct(Year) == Year_distinct) %>%
select(-Year_distinct)
数据:
df1 = read.table(text = "Year Region Reef Depth Transect Pharia_pyramidatus
2000 LP BALLENA 5 1 0.03
2000 LP ISLOTES 5 1 0.20
2000 LP NORTE 5 1 0.10
2000 LP NORTE 20 1 0.00", header = TRUE)
df2 = read.table(text = "Year Region Reef Depth Transect Pharia_pyramidatus
2010 LP PLAYA 5 1 0.03
2010 LP ISLOTES 5 1 0.20
2010 LP NORTE 5 1 0.10
2010 LP NORTE 20 1 0.00", header = TRUE)
df3 = read.table(text = "Year Region Reef Depth Transect Pharia_pyramidatus
2016 LP BALLENA 5 1 0.03
2016 LP ISLOTES 5 1 0.20
2016 LP SUR 5 1 0.10
2016 LP NORTE 20 1 0.00", header = TRUE)