重塑和聚合的组合 data.frame
Combination of reshaping and aggregating a data.frame
我还是比较新,所以请原谅任何失误,但我目前有一个看起来像这样的 data.frame。
Outcome1 Outcome2 Num_Occurances Name
False False 2 John Doe
False True 2 John Doe
True False 4 John Doe
True True 2 John Doe
False True 1 Sally Doe
True False 1 Sally Doe
我想将数据重塑并聚合成更宽的格式,最终看起来像这样:
- successful_outcome2 是 Outcome2
中真值的总和
- Total_Occurances是每个名字Num_Occurances的总和
- successful_outcome1 是 Outcome1 和 Outcome2 都为 True
- Total_Occurances_Outcome1 是 Outcome1 类别中所有 True 响应的总和
Name successful_Outcome2 Total_Occurances successful_Outcome1 Total_Occurances_Outcome1
John Doe 4 10 2 6
Sally Doe 1 2 0 1
我知道 dcast 函数可用于将数据融合和重铸为宽格式,但不同结果的组合让我陷入困境。如有任何帮助,我们将不胜感激!
使用dplyr
:
library(dplyr)
df %>%
mutate_at(vars(starts_with('Outcome')), as.logical) %>%
group_by(Name) %>%
summarise(successful_Outcome2 = sum(Num_Occurances[Outcome2]),
Total_Occurances = sum(Num_Occurances),
successful_outcome1 = sum(Num_Occurances[Outcome1 & Outcome2]),
Total_Occurances_Outcome1 = sum(Num_Occurances[Outcome1]))
# Name successful_Outcome2 Total_Occurances successful_outcome1 Total_Occurances_Outcome1
# <fct> <int> <int> <int> <int>
#1 JohnDoe 4 10 2 6
#2 SallyDoe 1 2 0 1
一种方法是先扩展数据框,然后总结所有内容:
library(dplyr)
library(tidyr)
df[rep(1:nrow(df), df$Num_Occurances), -3] %>%
group_by(Name) %>%
summarise(successful_outcome2=sum(Outcome2),
Total_Occurances=n(),
successful_outcome1=sum(Outcome1 & Outcome2),
Total_Occurances_Outcome1=sum(Outcome1))
# A tibble: 2 x 5
Name successful_outcome2 Total_Occurances successful_outcome1 Total_Occurances_Outcome1
<chr> <int> <int> <int> <int>
1 John Doe 4 10 2 6
2 Sally Doe 1 2 0 1
数据:
df <- structure(list(Outcome1 = c(FALSE, FALSE, TRUE, TRUE, FALSE,
TRUE), Outcome2 = c(FALSE, TRUE, FALSE, TRUE, TRUE, FALSE), Num_Occurances = c(2L,
2L, 4L, 2L, 1L, 1L), Name = c("John Doe", "John Doe", "John Doe",
"John Doe", "Sally Doe", "Sally Doe")), class = "data.frame", row.names = c(NA,
-6L))
具有 aggregate
+ transform
的基础 R 解决方案,即
dfout <- aggregate(.~Name,
transform(df,
successful_outcome2 = Outcome2*Num_Occurances,
Total_Occurances = Num_Occurances,
successful_Outcome1 = Outcome1*Outcome2*Num_Occurances,
Total_Occurances_Outcome1 = Outcome1*Num_Occurances),
sum)[-(2:4)]
屈服
> dfout
Name successful_outcome2 Total_Occurances successful_Outcome1 Total_Occurances_Outcome1
1 John Doe 4 10 2 6
2 Sally Doe 1 2 0 1
数据
df <- structure(list(Outcome1 = c(FALSE, FALSE, TRUE, TRUE, FALSE,
TRUE), Outcome2 = c(FALSE, TRUE, FALSE, TRUE, TRUE, FALSE), Num_Occurances = c(2L,
2L, 4L, 2L, 1L, 1L), Name = c("John Doe", "John Doe", "John Doe",
"John Doe", "Sally Doe", "Sally Doe")), class = "data.frame", row.names = c(NA,
-6L))
我还是比较新,所以请原谅任何失误,但我目前有一个看起来像这样的 data.frame。
Outcome1 Outcome2 Num_Occurances Name
False False 2 John Doe
False True 2 John Doe
True False 4 John Doe
True True 2 John Doe
False True 1 Sally Doe
True False 1 Sally Doe
我想将数据重塑并聚合成更宽的格式,最终看起来像这样:
- successful_outcome2 是 Outcome2 中真值的总和
- Total_Occurances是每个名字Num_Occurances的总和
- successful_outcome1 是 Outcome1 和 Outcome2 都为 True
- Total_Occurances_Outcome1 是 Outcome1 类别中所有 True 响应的总和
Name successful_Outcome2 Total_Occurances successful_Outcome1 Total_Occurances_Outcome1
John Doe 4 10 2 6
Sally Doe 1 2 0 1
我知道 dcast 函数可用于将数据融合和重铸为宽格式,但不同结果的组合让我陷入困境。如有任何帮助,我们将不胜感激!
使用dplyr
:
library(dplyr)
df %>%
mutate_at(vars(starts_with('Outcome')), as.logical) %>%
group_by(Name) %>%
summarise(successful_Outcome2 = sum(Num_Occurances[Outcome2]),
Total_Occurances = sum(Num_Occurances),
successful_outcome1 = sum(Num_Occurances[Outcome1 & Outcome2]),
Total_Occurances_Outcome1 = sum(Num_Occurances[Outcome1]))
# Name successful_Outcome2 Total_Occurances successful_outcome1 Total_Occurances_Outcome1
# <fct> <int> <int> <int> <int>
#1 JohnDoe 4 10 2 6
#2 SallyDoe 1 2 0 1
一种方法是先扩展数据框,然后总结所有内容:
library(dplyr)
library(tidyr)
df[rep(1:nrow(df), df$Num_Occurances), -3] %>%
group_by(Name) %>%
summarise(successful_outcome2=sum(Outcome2),
Total_Occurances=n(),
successful_outcome1=sum(Outcome1 & Outcome2),
Total_Occurances_Outcome1=sum(Outcome1))
# A tibble: 2 x 5
Name successful_outcome2 Total_Occurances successful_outcome1 Total_Occurances_Outcome1
<chr> <int> <int> <int> <int>
1 John Doe 4 10 2 6
2 Sally Doe 1 2 0 1
数据:
df <- structure(list(Outcome1 = c(FALSE, FALSE, TRUE, TRUE, FALSE,
TRUE), Outcome2 = c(FALSE, TRUE, FALSE, TRUE, TRUE, FALSE), Num_Occurances = c(2L,
2L, 4L, 2L, 1L, 1L), Name = c("John Doe", "John Doe", "John Doe",
"John Doe", "Sally Doe", "Sally Doe")), class = "data.frame", row.names = c(NA,
-6L))
具有 aggregate
+ transform
的基础 R 解决方案,即
dfout <- aggregate(.~Name,
transform(df,
successful_outcome2 = Outcome2*Num_Occurances,
Total_Occurances = Num_Occurances,
successful_Outcome1 = Outcome1*Outcome2*Num_Occurances,
Total_Occurances_Outcome1 = Outcome1*Num_Occurances),
sum)[-(2:4)]
屈服
> dfout
Name successful_outcome2 Total_Occurances successful_Outcome1 Total_Occurances_Outcome1
1 John Doe 4 10 2 6
2 Sally Doe 1 2 0 1
数据
df <- structure(list(Outcome1 = c(FALSE, FALSE, TRUE, TRUE, FALSE,
TRUE), Outcome2 = c(FALSE, TRUE, FALSE, TRUE, TRUE, FALSE), Num_Occurances = c(2L,
2L, 4L, 2L, 1L, 1L), Name = c("John Doe", "John Doe", "John Doe",
"John Doe", "Sally Doe", "Sally Doe")), class = "data.frame", row.names = c(NA,
-6L))