跨列和行汇总组

Question

我有一个数据框 df 其中：

 Year      Score    x1   x2   x3 
 2006      102      K    P    8   
 2006      89       L    K    P   
 2006      46       P    3    0   
 2007      76       L    2    1  
 2007      29       L    K    6   
 2008      690      P    4    4   
 2008      301      K    0    1   
 ...       ...      ..   ..   ..

不过，我希望它看起来像这样：

 Year     K    P    L    K_prop  P_prop  L_prop 
 2006     191  191  135  0.37    0.37    0.26    
 2007     29        105  0.22            0.78
 2008     301  690       0.30    0.70
 ...      ..   ..   ..   ..      ..      ..

其中每个 x 成为一列，其中包含按年份分组的该列的总和。我还想要另一列代表各个列在总分中的比例。

K_prop = K/(K+P+L); P_prop = P/(K+P+L); L_prop = L/(K+P+L)

如果描述不够详细，我深表歉意，但非常感谢您提供的所有帮助！

Answer 1

我们可以将 'long' 格式重塑为 pivot_longer 进行计算并将其重塑回 'wide' 格式

library(dplyr)
library(tidyr)
library(stringr)
df %>% 
    pivot_longer(cols = starts_with('x')) %>% 
    filter(str_detect(value, '[A-Za-z]')) %>% 
    group_by(Year, value) %>%
    summarise(Score = sum(Score)) %>%
    ungroup %>%        
    group_by(Year) %>%
    mutate(prop = Score/sum(Score)) %>% 
    pivot_wider(names_from = value, values_from = c(Score, prop))
# A tibble: 3 x 7
# Groups:   Year [3]
#   Year Score_K Score_L Score_P prop_K prop_L prop_P
#  <int>   <int>   <int>   <int>  <dbl>  <dbl>  <dbl>
#1  2006     191      89     237  0.369  0.172  0.458
#2  2007      29     105      NA  0.216  0.784 NA    
#3  2008     301      NA     690  0.304 NA      0.696

数据

df <- structure(list(Year = c(2006L, 2006L, 2006L, 2007L, 2007L, 2008L, 
2008L), Score = c(102L, 89L, 46L, 76L, 29L, 690L, 301L), x1 = c("K", 
"L", "P", "L", "L", "P", "K"), x2 = c("P", "K", "3", "2", "K", 
"4", "0"), x3 = c("8", "P", "0", "1", "6", "4", "1")), 
class = "data.frame", row.names = c(NA, 
-7L))

跨列和行汇总组

Summarise groups across columns and rows

r

multiple-columns

dataframe

数据