重塑和聚合的组合 data.frame

Combination of reshaping and aggregating a data.frame

我还是比较新,所以请原谅任何失误,但我目前有一个看起来像这样的 data.frame。

Outcome1  Outcome2  Num_Occurances Name
False       False       2          John Doe
False       True        2          John Doe
True        False       4          John Doe
True        True        2          John Doe
False       True        1          Sally Doe
True        False       1          Sally Doe

我想将数据重塑并聚合成更宽的格式,最终看起来像这样:

Name        successful_Outcome2  Total_Occurances  successful_Outcome1    Total_Occurances_Outcome1 
John Doe           4                  10                   2                        6
Sally Doe          1                   2                   0                        1

我知道 dcast 函数可用于将数据融合和重铸为宽格式,但不同结果的组合让我陷入困境。如有任何帮助,我们将不胜感激!

使用dplyr

library(dplyr)

df %>%
 mutate_at(vars(starts_with('Outcome')), as.logical) %>%
 group_by(Name) %>%
 summarise(successful_Outcome2 = sum(Num_Occurances[Outcome2]),
           Total_Occurances = sum(Num_Occurances), 
           successful_outcome1 = sum(Num_Occurances[Outcome1 & Outcome2]), 
            Total_Occurances_Outcome1 = sum(Num_Occurances[Outcome1]))


#  Name     successful_Outcome2 Total_Occurances successful_outcome1 Total_Occurances_Outcome1
#  <fct>                  <int>            <int>               <int>                     <int>
#1 JohnDoe                    4               10                   2                         6
#2 SallyDoe                   1                2                   0                         1

一种方法是先扩展数据框,然后总结所有内容:

library(dplyr)
library(tidyr)

df[rep(1:nrow(df), df$Num_Occurances), -3] %>%
  group_by(Name) %>%
  summarise(successful_outcome2=sum(Outcome2),
            Total_Occurances=n(),
            successful_outcome1=sum(Outcome1 & Outcome2),
         Total_Occurances_Outcome1=sum(Outcome1))
# A tibble: 2 x 5
  Name  successful_outcome2 Total_Occurances successful_outcome1 Total_Occurances_Outcome1
  <chr>                <int>            <int>              <int>                     <int>
1 John Doe                 4               10                   2                        6
2 Sally Doe                1                2                   0                        1

数据:

df <- structure(list(Outcome1 = c(FALSE, FALSE, TRUE, TRUE, FALSE, 
TRUE), Outcome2 = c(FALSE, TRUE, FALSE, TRUE, TRUE, FALSE), Num_Occurances = c(2L, 
2L, 4L, 2L, 1L, 1L), Name = c("John Doe", "John Doe", "John Doe", 
"John Doe", "Sally Doe", "Sally Doe")), class = "data.frame", row.names = c(NA, 
-6L))

具有 aggregate + transform 的基础 R 解决方案,即

dfout <- aggregate(.~Name,
                   transform(df,
                             successful_outcome2 = Outcome2*Num_Occurances,
                             Total_Occurances = Num_Occurances,
                             successful_Outcome1 = Outcome1*Outcome2*Num_Occurances,
                             Total_Occurances_Outcome1 = Outcome1*Num_Occurances),
                   sum)[-(2:4)]

屈服

> dfout
       Name successful_outcome2 Total_Occurances successful_Outcome1 Total_Occurances_Outcome1
1  John Doe                   4               10                   2                         6
2 Sally Doe                   1                2                   0                         1

数据

df <- structure(list(Outcome1 = c(FALSE, FALSE, TRUE, TRUE, FALSE, 
TRUE), Outcome2 = c(FALSE, TRUE, FALSE, TRUE, TRUE, FALSE), Num_Occurances = c(2L, 
2L, 4L, 2L, 1L, 1L), Name = c("John Doe", "John Doe", "John Doe", 
"John Doe", "Sally Doe", "Sally Doe")), class = "data.frame", row.names = c(NA, 
-6L))