跨列和行汇总组
Summarise groups across columns and rows
我有一个数据框 df
其中:
Year Score x1 x2 x3
2006 102 K P 8
2006 89 L K P
2006 46 P 3 0
2007 76 L 2 1
2007 29 L K 6
2008 690 P 4 4
2008 301 K 0 1
... ... .. .. ..
不过,我希望它看起来像这样:
Year K P L K_prop P_prop L_prop
2006 191 191 135 0.37 0.37 0.26
2007 29 105 0.22 0.78
2008 301 690 0.30 0.70
... .. .. .. .. .. ..
其中每个 x
成为一列,其中包含按年份分组的该列的总和。我还想要另一列代表各个列在总分中的比例。
K_prop = K/(K+P+L)
; P_prop = P/(K+P+L)
; L_prop = L/(K+P+L)
如果描述不够详细,我深表歉意,但非常感谢您提供的所有帮助!
我们可以将 'long' 格式重塑为 pivot_longer
进行计算并将其重塑回 'wide' 格式
library(dplyr)
library(tidyr)
library(stringr)
df %>%
pivot_longer(cols = starts_with('x')) %>%
filter(str_detect(value, '[A-Za-z]')) %>%
group_by(Year, value) %>%
summarise(Score = sum(Score)) %>%
ungroup %>%
group_by(Year) %>%
mutate(prop = Score/sum(Score)) %>%
pivot_wider(names_from = value, values_from = c(Score, prop))
# A tibble: 3 x 7
# Groups: Year [3]
# Year Score_K Score_L Score_P prop_K prop_L prop_P
# <int> <int> <int> <int> <dbl> <dbl> <dbl>
#1 2006 191 89 237 0.369 0.172 0.458
#2 2007 29 105 NA 0.216 0.784 NA
#3 2008 301 NA 690 0.304 NA 0.696
数据
df <- structure(list(Year = c(2006L, 2006L, 2006L, 2007L, 2007L, 2008L,
2008L), Score = c(102L, 89L, 46L, 76L, 29L, 690L, 301L), x1 = c("K",
"L", "P", "L", "L", "P", "K"), x2 = c("P", "K", "3", "2", "K",
"4", "0"), x3 = c("8", "P", "0", "1", "6", "4", "1")),
class = "data.frame", row.names = c(NA,
-7L))
我有一个数据框 df
其中:
Year Score x1 x2 x3
2006 102 K P 8
2006 89 L K P
2006 46 P 3 0
2007 76 L 2 1
2007 29 L K 6
2008 690 P 4 4
2008 301 K 0 1
... ... .. .. ..
不过,我希望它看起来像这样:
Year K P L K_prop P_prop L_prop
2006 191 191 135 0.37 0.37 0.26
2007 29 105 0.22 0.78
2008 301 690 0.30 0.70
... .. .. .. .. .. ..
其中每个 x
成为一列,其中包含按年份分组的该列的总和。我还想要另一列代表各个列在总分中的比例。
K_prop = K/(K+P+L)
; P_prop = P/(K+P+L)
; L_prop = L/(K+P+L)
如果描述不够详细,我深表歉意,但非常感谢您提供的所有帮助!
我们可以将 'long' 格式重塑为 pivot_longer
进行计算并将其重塑回 'wide' 格式
library(dplyr)
library(tidyr)
library(stringr)
df %>%
pivot_longer(cols = starts_with('x')) %>%
filter(str_detect(value, '[A-Za-z]')) %>%
group_by(Year, value) %>%
summarise(Score = sum(Score)) %>%
ungroup %>%
group_by(Year) %>%
mutate(prop = Score/sum(Score)) %>%
pivot_wider(names_from = value, values_from = c(Score, prop))
# A tibble: 3 x 7
# Groups: Year [3]
# Year Score_K Score_L Score_P prop_K prop_L prop_P
# <int> <int> <int> <int> <dbl> <dbl> <dbl>
#1 2006 191 89 237 0.369 0.172 0.458
#2 2007 29 105 NA 0.216 0.784 NA
#3 2008 301 NA 690 0.304 NA 0.696
数据
df <- structure(list(Year = c(2006L, 2006L, 2006L, 2007L, 2007L, 2008L,
2008L), Score = c(102L, 89L, 46L, 76L, 29L, 690L, 301L), x1 = c("K",
"L", "P", "L", "L", "P", "K"), x2 = c("P", "K", "3", "2", "K",
"4", "0"), x3 = c("8", "P", "0", "1", "6", "4", "1")),
class = "data.frame", row.names = c(NA,
-7L))