重塑和聚合的组合 data.frame

Question

我还是比较新，所以请原谅任何失误，但我目前有一个看起来像这样的 data.frame。

Outcome1  Outcome2  Num_Occurances Name
False       False       2          John Doe
False       True        2          John Doe
True        False       4          John Doe
True        True        2          John Doe
False       True        1          Sally Doe
True        False       1          Sally Doe

我想将数据重塑并聚合成更宽的格式，最终看起来像这样：

successful_outcome2 是 Outcome2
Total_Occurances是每个名字Num_Occurances的总和
successful_outcome1 是 Outcome1 和 Outcome2 都为 True
Total_Occurances_Outcome1 是 Outcome1 类别中所有 True 响应的总和

Name        successful_Outcome2  Total_Occurances  successful_Outcome1    Total_Occurances_Outcome1 
John Doe           4                  10                   2                        6
Sally Doe          1                   2                   0                        1

我知道 dcast 函数可用于将数据融合和重铸为宽格式，但不同结果的组合让我陷入困境。如有任何帮助，我们将不胜感激！

Answer 1

使用dplyr：

library(dplyr)

df %>%
 mutate_at(vars(starts_with('Outcome')), as.logical) %>%
 group_by(Name) %>%
 summarise(successful_Outcome2 = sum(Num_Occurances[Outcome2]),
           Total_Occurances = sum(Num_Occurances), 
           successful_outcome1 = sum(Num_Occurances[Outcome1 & Outcome2]), 
            Total_Occurances_Outcome1 = sum(Num_Occurances[Outcome1]))


#  Name     successful_Outcome2 Total_Occurances successful_outcome1 Total_Occurances_Outcome1
#  <fct>                  <int>            <int>               <int>                     <int>
#1 JohnDoe                    4               10                   2                         6
#2 SallyDoe                   1                2                   0                         1

Answer 2

一种方法是先扩展数据框，然后总结所有内容：

library(dplyr)
library(tidyr)

df[rep(1:nrow(df), df$Num_Occurances), -3] %>%
  group_by(Name) %>%
  summarise(successful_outcome2=sum(Outcome2),
            Total_Occurances=n(),
            successful_outcome1=sum(Outcome1 & Outcome2),
         Total_Occurances_Outcome1=sum(Outcome1))
# A tibble: 2 x 5
  Name  successful_outcome2 Total_Occurances successful_outcome1 Total_Occurances_Outcome1
  <chr>                <int>            <int>              <int>                     <int>
1 John Doe                 4               10                   2                        6
2 Sally Doe                1                2                   0                        1

数据:

df <- structure(list(Outcome1 = c(FALSE, FALSE, TRUE, TRUE, FALSE, 
TRUE), Outcome2 = c(FALSE, TRUE, FALSE, TRUE, TRUE, FALSE), Num_Occurances = c(2L, 
2L, 4L, 2L, 1L, 1L), Name = c("John Doe", "John Doe", "John Doe", 
"John Doe", "Sally Doe", "Sally Doe")), class = "data.frame", row.names = c(NA, 
-6L))

Answer 3

具有 aggregate + transform 的基础 R 解决方案，即

dfout <- aggregate(.~Name,
                   transform(df,
                             successful_outcome2 = Outcome2*Num_Occurances,
                             Total_Occurances = Num_Occurances,
                             successful_Outcome1 = Outcome1*Outcome2*Num_Occurances,
                             Total_Occurances_Outcome1 = Outcome1*Num_Occurances),
                   sum)[-(2:4)]

屈服

> dfout
       Name successful_outcome2 Total_Occurances successful_Outcome1 Total_Occurances_Outcome1
1  John Doe                   4               10                   2                         6
2 Sally Doe                   1                2                   0                         1

数据

df <- structure(list(Outcome1 = c(FALSE, FALSE, TRUE, TRUE, FALSE, 
TRUE), Outcome2 = c(FALSE, TRUE, FALSE, TRUE, TRUE, FALSE), Num_Occurances = c(2L, 
2L, 4L, 2L, 1L, 1L), Name = c("John Doe", "John Doe", "John Doe", 
"John Doe", "Sally Doe", "Sally Doe")), class = "data.frame", row.names = c(NA, 
-6L))

重塑和聚合的组合 data.frame

Combination of reshaping and aggregating a data.frame

aggregate

r

dataframe

dcast