按多列(日期、列)分组,然后按 R 中的第三列排名
Group by multiple columns (Date, Column), then rank by third Column in R
我有包含 Date
、GEO
、VALUE
的数据库。我想先按 Date
排序,然后按 GEO
排序,最后按 VALUE
.
降序创建排名
这是我的df1
,如果有任何NA应该排在最后。
我的输出应该是
structure(list(Date = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("01/01/2020",
"01/01/2021"), class = "factor"), GEO = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Barrie",
"Toronto"), class = "factor"), NAICS = structure(c(14L, 17L,
8L, 10L, 14L, 17L, 8L, 10L, 14L, 17L, 12L, 6L, 14L, 17L, 12L,
6L), .Label = c("Accomd. & food serv.", "Agriculture", "Bus., build.,& other supp. services",
"Construction", "Educational services", "Fin., Insur., Real est., Rental, & lease",
"Forest.,Fish.,Mining.,Quar.,Oil & Gas", "Health care & Social Assis",
"Info., culture & rec.", "Manufacturing", "Other services (except pub. admin)",
"Prof., Sci.,and Tech. Serv", "Public administration", "Total",
"Transp. & warehousing", "Utilities", "Wholesale and retail trade"
), class = "factor"), VALUE = c(114.5, 17.6, 15.1, 14.4, 117.6,
17.1, NA, NA, 3393.4, 520.5, 443.9, 393.4, 3221.8, 486.2,
414.2, 400.5), rank = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L,
3L, 4L, 1L, 2L, 3L, 4L)), class = "data.frame", row.names = c(NA,
-16L))
我的代码不能正常工作,因为它不是按日期排序,然后是地理位置,然后是值。
df1 %>%
arrange(GEO, desc(VALUE)) %>%
group_by(GEO) %>%
mutate(rank = row_number()) -> df1
根据您的描述,您应该先按 GEO
进行分组,然后再按 Date
:
进行分组
library(dplyr)
df %>%
group_by(GEO, Date) %>%
arrange(desc(VALUE), .by_group = TRUE) %>%
mutate(rank.new = row_number())
# A tibble: 16 x 6
# Groups: GEO, Date [4]
Date GEO NAICS VALUE rank rank.new
<fct> <fct> <fct> <dbl> <int> <int>
1 01/01/2020 Barrie Total 118. 1 1
2 01/01/2020 Barrie Wholesale and retail trade 17.1 2 2
3 01/01/2020 Barrie Health care & Social Assis NA 3 3
4 01/01/2020 Barrie Manufacturing NA 4 4
5 01/01/2021 Barrie Total 114. 1 1
6 01/01/2021 Barrie Wholesale and retail trade 17.6 2 2
7 01/01/2021 Barrie Health care & Social Assis 15.1 3 3
8 01/01/2021 Barrie Manufacturing 14.4 4 4
9 01/01/2020 Toronto Total 3222. 1 1
10 01/01/2020 Toronto Wholesale and retail trade 486. 2 2
11 01/01/2020 Toronto Prof., Sci.,and Tech. Serv 414. 3 3
12 01/01/2020 Toronto Fin., Insur., Real est., Rental, & lease 400. 4 4
13 01/01/2021 Toronto Total 3393. 1 1
14 01/01/2021 Toronto Wholesale and retail trade 520. 2 2
15 01/01/2021 Toronto Prof., Sci.,and Tech. Serv 444. 3 3
16 01/01/2021 Toronto Fin., Insur., Real est., Rental, & lease 393. 4 4
我有包含 Date
、GEO
、VALUE
的数据库。我想先按 Date
排序,然后按 GEO
排序,最后按 VALUE
.
这是我的df1
,如果有任何NA应该排在最后。
我的输出应该是
structure(list(Date = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("01/01/2020",
"01/01/2021"), class = "factor"), GEO = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Barrie",
"Toronto"), class = "factor"), NAICS = structure(c(14L, 17L,
8L, 10L, 14L, 17L, 8L, 10L, 14L, 17L, 12L, 6L, 14L, 17L, 12L,
6L), .Label = c("Accomd. & food serv.", "Agriculture", "Bus., build.,& other supp. services",
"Construction", "Educational services", "Fin., Insur., Real est., Rental, & lease",
"Forest.,Fish.,Mining.,Quar.,Oil & Gas", "Health care & Social Assis",
"Info., culture & rec.", "Manufacturing", "Other services (except pub. admin)",
"Prof., Sci.,and Tech. Serv", "Public administration", "Total",
"Transp. & warehousing", "Utilities", "Wholesale and retail trade"
), class = "factor"), VALUE = c(114.5, 17.6, 15.1, 14.4, 117.6,
17.1, NA, NA, 3393.4, 520.5, 443.9, 393.4, 3221.8, 486.2,
414.2, 400.5), rank = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L,
3L, 4L, 1L, 2L, 3L, 4L)), class = "data.frame", row.names = c(NA,
-16L))
我的代码不能正常工作,因为它不是按日期排序,然后是地理位置,然后是值。
df1 %>%
arrange(GEO, desc(VALUE)) %>%
group_by(GEO) %>%
mutate(rank = row_number()) -> df1
根据您的描述,您应该先按 GEO
进行分组,然后再按 Date
:
library(dplyr)
df %>%
group_by(GEO, Date) %>%
arrange(desc(VALUE), .by_group = TRUE) %>%
mutate(rank.new = row_number())
# A tibble: 16 x 6
# Groups: GEO, Date [4]
Date GEO NAICS VALUE rank rank.new
<fct> <fct> <fct> <dbl> <int> <int>
1 01/01/2020 Barrie Total 118. 1 1
2 01/01/2020 Barrie Wholesale and retail trade 17.1 2 2
3 01/01/2020 Barrie Health care & Social Assis NA 3 3
4 01/01/2020 Barrie Manufacturing NA 4 4
5 01/01/2021 Barrie Total 114. 1 1
6 01/01/2021 Barrie Wholesale and retail trade 17.6 2 2
7 01/01/2021 Barrie Health care & Social Assis 15.1 3 3
8 01/01/2021 Barrie Manufacturing 14.4 4 4
9 01/01/2020 Toronto Total 3222. 1 1
10 01/01/2020 Toronto Wholesale and retail trade 486. 2 2
11 01/01/2020 Toronto Prof., Sci.,and Tech. Serv 414. 3 3
12 01/01/2020 Toronto Fin., Insur., Real est., Rental, & lease 400. 4 4
13 01/01/2021 Toronto Total 3393. 1 1
14 01/01/2021 Toronto Wholesale and retail trade 520. 2 2
15 01/01/2021 Toronto Prof., Sci.,and Tech. Serv 444. 3 3
16 01/01/2021 Toronto Fin., Insur., Real est., Rental, & lease 393. 4 4