根据其他两列中的条件在 R 中创建一个新列
Create a new column in R based on conditions in two other columns
我目前正在处理一些生态研究数据,并尝试了好几个小时。
我有一个类似的数据框,但比这个大得多:
beetles <- data.frame(Area=c("A","A","A","B","B","B","C","C","D","D","D","D"),
Year =c(1993, 1994, 1994, 1994,1995, 1995, 1996,1997,1998,1997,1996,1996),species=c("Harpalus latus","Amara ovata","Harpalus latus","Dromius agilis","Amara ovata","Harpalus latus","Amara ovata","Harpalus latus","Harpalus latus","Amara ovata","Dromius agilis","Harpalus latus"), field_season= c(1,2,2,1,2,2,1,2,3,2,1,1))
我想做的是:我有 4 个研究领域的甲虫数据,这些数据是在多年的时间范围内采样的。为了进行分析,我需要一个列,其中包含每个研究区域 (field_season) 每个物种被捕获的野外季节数。我正在寻找名为“field_season”的列,该列当前不在我的 data.frame 中。提供更多背景信息:对于分析,我想拆分我的数据集,看看甲虫群落在野外季节的差异有多大。
我尝试使用:
beetles %>% group_by(Area) %>% mutate(field_season = year ?)
但不知道该怎么做。请,如果有人能指出我正确的方向,那将不胜感激。
beetles %>%
dplyr::group_by(Area) %>%
dplyr::summarise(sum_season = sum(field_season)) %>%
dplyr::left_join(beetles)
像这样?
Joining, by = "Area"
# A tibble: 12 x 5
Area sum_season Year species field_season
<chr> <dbl> <dbl> <chr> <dbl>
1 A 5 1993 Harpalus latus 1
2 A 5 1994 Amara ovata 2
3 A 5 1994 Harpalus latus 2
4 B 5 1994 Dromius agilis 1
5 B 5 1995 Amara ovata 2
6 B 5 1995 Harpalus latus 2
7 C 3 1996 Amara ovata 1
8 C 3 1997 Harpalus latus 2
9 D 7 1998 Harpalus latus 3
10 D 7 1997 Amara ovata 2
11 D 7 1996 Dromius agilis 1
12 D 7 1996 Harpalus latus 1
如果您只想按 Area
或 Area
和 Year
来计算,我不会起诉
- 按
Area
分组
> within(beetles, counts <- ave(field_season,Area,FUN = sum))
Area Year species field_season counts
1 A 1993 Harpalus latus 1 5
2 A 1994 Amara ovata 2 5
3 A 1994 Harpalus latus 2 5
4 B 1994 Dromius agilis 1 5
5 B 1995 Amara ovata 2 5
6 B 1995 Harpalus latus 2 5
7 C 1996 Amara ovata 1 3
8 C 1997 Harpalus latus 2 3
9 D 1998 Harpalus latus 3 7
10 D 1997 Amara ovata 2 7
11 D 1996 Dromius agilis 1 7
12 D 1996 Harpalus latus 1 7
- 按
Area
+ Year
分组
> within(beetles, counts <- ave(field_season,Area,Year, FUN = sum))
Area Year species field_season counts
1 A 1993 Harpalus latus 1 1
2 A 1994 Amara ovata 2 4
3 A 1994 Harpalus latus 2 4
4 B 1994 Dromius agilis 1 1
5 B 1995 Amara ovata 2 4
6 B 1995 Harpalus latus 2 4
7 C 1996 Amara ovata 1 1
8 C 1997 Harpalus latus 2 2
9 D 1998 Harpalus latus 3 3
10 D 1997 Amara ovata 2 2
11 D 1996 Dromius agilis 1 2
12 D 1996 Harpalus latus 1 2
您可以使用 dplyr
中的 dense_rank
:
library(dplyr)
beetles %>% group_by(Area) %>% mutate(field_season_ans = dense_rank(Year))
# Area Year species field_season field_season_ans
# <chr> <dbl> <chr> <dbl> <int>
# 1 A 1993 Harpalus latus 1 1
# 2 A 1994 Amara ovata 2 2
# 3 A 1994 Harpalus latus 2 2
# 4 B 1994 Dromius agilis 1 1
# 5 B 1995 Amara ovata 2 2
# 6 B 1995 Harpalus latus 2 2
# 7 C 1996 Amara ovata 1 1
# 8 C 1997 Harpalus latus 2 2
# 9 D 1998 Harpalus latus 3 3
#10 D 1997 Amara ovata 2 2
#11 D 1996 Dromius agilis 1 1
#12 D 1996 Harpalus latus 1 1
我目前正在处理一些生态研究数据,并尝试了好几个小时。 我有一个类似的数据框,但比这个大得多:
beetles <- data.frame(Area=c("A","A","A","B","B","B","C","C","D","D","D","D"),
Year =c(1993, 1994, 1994, 1994,1995, 1995, 1996,1997,1998,1997,1996,1996),species=c("Harpalus latus","Amara ovata","Harpalus latus","Dromius agilis","Amara ovata","Harpalus latus","Amara ovata","Harpalus latus","Harpalus latus","Amara ovata","Dromius agilis","Harpalus latus"), field_season= c(1,2,2,1,2,2,1,2,3,2,1,1))
我想做的是:我有 4 个研究领域的甲虫数据,这些数据是在多年的时间范围内采样的。为了进行分析,我需要一个列,其中包含每个研究区域 (field_season) 每个物种被捕获的野外季节数。我正在寻找名为“field_season”的列,该列当前不在我的 data.frame 中。提供更多背景信息:对于分析,我想拆分我的数据集,看看甲虫群落在野外季节的差异有多大。
我尝试使用:
beetles %>% group_by(Area) %>% mutate(field_season = year ?)
但不知道该怎么做。请,如果有人能指出我正确的方向,那将不胜感激。
beetles %>%
dplyr::group_by(Area) %>%
dplyr::summarise(sum_season = sum(field_season)) %>%
dplyr::left_join(beetles)
像这样?
Joining, by = "Area"
# A tibble: 12 x 5
Area sum_season Year species field_season
<chr> <dbl> <dbl> <chr> <dbl>
1 A 5 1993 Harpalus latus 1
2 A 5 1994 Amara ovata 2
3 A 5 1994 Harpalus latus 2
4 B 5 1994 Dromius agilis 1
5 B 5 1995 Amara ovata 2
6 B 5 1995 Harpalus latus 2
7 C 3 1996 Amara ovata 1
8 C 3 1997 Harpalus latus 2
9 D 7 1998 Harpalus latus 3
10 D 7 1997 Amara ovata 2
11 D 7 1996 Dromius agilis 1
12 D 7 1996 Harpalus latus 1
如果您只想按 Area
或 Area
和 Year
- 按
Area
分组
> within(beetles, counts <- ave(field_season,Area,FUN = sum))
Area Year species field_season counts
1 A 1993 Harpalus latus 1 5
2 A 1994 Amara ovata 2 5
3 A 1994 Harpalus latus 2 5
4 B 1994 Dromius agilis 1 5
5 B 1995 Amara ovata 2 5
6 B 1995 Harpalus latus 2 5
7 C 1996 Amara ovata 1 3
8 C 1997 Harpalus latus 2 3
9 D 1998 Harpalus latus 3 7
10 D 1997 Amara ovata 2 7
11 D 1996 Dromius agilis 1 7
12 D 1996 Harpalus latus 1 7
- 按
Area
+Year
分组
> within(beetles, counts <- ave(field_season,Area,Year, FUN = sum))
Area Year species field_season counts
1 A 1993 Harpalus latus 1 1
2 A 1994 Amara ovata 2 4
3 A 1994 Harpalus latus 2 4
4 B 1994 Dromius agilis 1 1
5 B 1995 Amara ovata 2 4
6 B 1995 Harpalus latus 2 4
7 C 1996 Amara ovata 1 1
8 C 1997 Harpalus latus 2 2
9 D 1998 Harpalus latus 3 3
10 D 1997 Amara ovata 2 2
11 D 1996 Dromius agilis 1 2
12 D 1996 Harpalus latus 1 2
您可以使用 dplyr
中的 dense_rank
:
library(dplyr)
beetles %>% group_by(Area) %>% mutate(field_season_ans = dense_rank(Year))
# Area Year species field_season field_season_ans
# <chr> <dbl> <chr> <dbl> <int>
# 1 A 1993 Harpalus latus 1 1
# 2 A 1994 Amara ovata 2 2
# 3 A 1994 Harpalus latus 2 2
# 4 B 1994 Dromius agilis 1 1
# 5 B 1995 Amara ovata 2 2
# 6 B 1995 Harpalus latus 2 2
# 7 C 1996 Amara ovata 1 1
# 8 C 1997 Harpalus latus 2 2
# 9 D 1998 Harpalus latus 3 3
#10 D 1997 Amara ovata 2 2
#11 D 1996 Dromius agilis 1 1
#12 D 1996 Harpalus latus 1 1