独立于顺序汇总 R 数据框中的列,(df$A,df$B) = (df$B,df$A)
Summarize columns in R Data Frame independent of the order, (df$A,df$B) = (df$B,df$A)
我有以下数据框:
destiny origin Count
1 KJFK SBBR 4
2 KJFK SAEZ 4683
3 SBGL KJFK 2
4 SBBR KJFK 2
5 KJFK SBGL 4987
6 KJFK SBGR 12911
...
由于我对路线很感兴趣,对我来说,KJFK -> SBBR 与 SBBR -> KJFK 相同。所以我想总结一下他们的数量,如下面的table
destiny origin Count
1 KJFK SBBR 6
2 KJFK SAEZ 4683
3 SBGL KJFK 4989
4 KJFK SBGR 12911
...
我不想使用大的 for 循环来评估所有值
这个怎么样?
library(tidyverse)
df %>%
mutate_if(is.factor, as.character) %>%
rowwise() %>%
mutate(grp = paste0(sort(c(destiny, origin)), collapse = "_")) %>%
ungroup() %>%
group_by(grp) %>%
summarise(Count = sum(Count)) %>%
separate(grp, into = c("destiny", "origin"))
# # A tibble: 4 x 3
# destiny origin Count
# <chr> <chr> <int>
#1 KJFK SAEZ 4683
#2 KJFK SBBR 6
#3 KJFK SBGL 4989
#4 KJFK SBGR 12911
请注意,由于您不关心destiny
、origin
的顺序,所以这里我们按字母顺序排列。因此,在您上面给出的示例中,KJFK -> SBBR
和 SBBR -> KJFK
将变为 destiny = KJFK, origin = SBBR
.
示例数据
df <- read.table(text =
" destiny origin Count
1 KJFK SBBR 4
2 KJFK SAEZ 4683
3 SBGL KJFK 2
4 SBBR KJFK 2
5 KJFK SBGL 4987
6 KJFK SBGR 12911", header =T)
这里有一个选项pmin/pmax
library(tidyverse)
df1 %>%
group_by(destinyN = pmin(destiny, origin), originN = pmax(destiny, origin)) %>%
summarise(destiny = first(destiny),
origin = first(origin),
Count = sum(Count)) %>%
ungroup %>%
select(-destinyN, -originN)
# A tibble: 4 x 3
# destiny origin Count
# <chr> <chr> <int>
#1 KJFK SAEZ 4683
#2 KJFK SBBR 6
#3 SBGL KJFK 4989
#4 KJFK SBGR 12911
数据
df1 <- structure(list(destiny = c("KJFK", "KJFK", "SBGL", "SBBR", "KJFK",
"KJFK"), origin = c("SBBR", "SAEZ", "KJFK", "KJFK", "SBGL", "SBGR"
), Count = c(4L, 4683L, 2L, 2L, 4987L, 12911L)), .Names = c("destiny",
"origin", "Count"), row.names = c("1", "2", "3", "4", "5", "6"
), class = "data.frame")
我有以下数据框:
destiny origin Count 1 KJFK SBBR 4 2 KJFK SAEZ 4683 3 SBGL KJFK 2 4 SBBR KJFK 2 5 KJFK SBGL 4987 6 KJFK SBGR 12911 ...
由于我对路线很感兴趣,对我来说,KJFK -> SBBR 与 SBBR -> KJFK 相同。所以我想总结一下他们的数量,如下面的table
destiny origin Count 1 KJFK SBBR 6 2 KJFK SAEZ 4683 3 SBGL KJFK 4989 4 KJFK SBGR 12911 ...
我不想使用大的 for 循环来评估所有值
这个怎么样?
library(tidyverse)
df %>%
mutate_if(is.factor, as.character) %>%
rowwise() %>%
mutate(grp = paste0(sort(c(destiny, origin)), collapse = "_")) %>%
ungroup() %>%
group_by(grp) %>%
summarise(Count = sum(Count)) %>%
separate(grp, into = c("destiny", "origin"))
# # A tibble: 4 x 3
# destiny origin Count
# <chr> <chr> <int>
#1 KJFK SAEZ 4683
#2 KJFK SBBR 6
#3 KJFK SBGL 4989
#4 KJFK SBGR 12911
请注意,由于您不关心destiny
、origin
的顺序,所以这里我们按字母顺序排列。因此,在您上面给出的示例中,KJFK -> SBBR
和 SBBR -> KJFK
将变为 destiny = KJFK, origin = SBBR
.
示例数据
df <- read.table(text =
" destiny origin Count
1 KJFK SBBR 4
2 KJFK SAEZ 4683
3 SBGL KJFK 2
4 SBBR KJFK 2
5 KJFK SBGL 4987
6 KJFK SBGR 12911", header =T)
这里有一个选项pmin/pmax
library(tidyverse)
df1 %>%
group_by(destinyN = pmin(destiny, origin), originN = pmax(destiny, origin)) %>%
summarise(destiny = first(destiny),
origin = first(origin),
Count = sum(Count)) %>%
ungroup %>%
select(-destinyN, -originN)
# A tibble: 4 x 3
# destiny origin Count
# <chr> <chr> <int>
#1 KJFK SAEZ 4683
#2 KJFK SBBR 6
#3 SBGL KJFK 4989
#4 KJFK SBGR 12911
数据
df1 <- structure(list(destiny = c("KJFK", "KJFK", "SBGL", "SBBR", "KJFK",
"KJFK"), origin = c("SBBR", "SAEZ", "KJFK", "KJFK", "SBGL", "SBGR"
), Count = c(4L, 4683L, 2L, 2L, 4987L, 12911L)), .Names = c("destiny",
"origin", "Count"), row.names = c("1", "2", "3", "4", "5", "6"
), class = "data.frame")