查找每个多个分组的数据框列差异

Question

在 R 中，我想从相同值列的总和（按列 't1' 中的相同字母分组）中减去值列的总和（按列 't1' 中的字母分组） 30=]).对每个字母和每个年组重复该过程。

考虑一下；

set.seed(3)    
df <- data.frame(age = rep(1:3,each=25),
                      t1 = rep(expand.grid(LETTERS[1:5],LETTERS[1:5])[,1],3),
                      t2 = rep(expand.grid(LETTERS[1:5],LETTERS[1:5])[,2],3),
                   value = sample(1:10,75,replace=T))

此数据框在 'age' 列中显示 3 个值，2 列具有类别（t1 和 t2）和关联值（值）。

举个例子，下面是它如何适用于 'A':

library(plyr);

# extract rows with A
df2 <- df[df$t1=="A" | df$t2=="A",]
# remove where t1 and t2 are the same (not needed) 
df2 <- df2[df2$t1 != df2$t2,]
# use ddply to subtract sum of 'value' for A in t1 from t2
df2 <- ddply(df2, .(age), transform, change = sum(value[t2=="A"])-sum(value[t1=="A"]))
# create a name
df2$cat <- "A"
# remove all the duplicate rows, just need one summary value
df2 <- df2[ !duplicated(df2$change), ]
# keep summary data
df2 <- df2[,c(1,6,5)]

现在我需要对 t1 和 t2 中出现的所有值（在本例中为 A、B、C 和 D）执行此操作，创建一个 12 行的摘要。

我尝试了一个循环；

for (c in as.character(unique(df$t1)))

却不知所措

非常感谢

Answer 1

我建议先整理您的数据，然后您可以 spread post-summarise 并添加一个新列：

# Make reproducible
set.seed(4)
df <- data.frame(age = rep(1:3,each=25),
                 t1 = rep(expand.grid(LETTERS[1:5],LETTERS[1:5])[,1],3),
                 t2 = rep(expand.grid(LETTERS[1:5],LETTERS[1:5])[,2],3),
                 value = sample(1:10,75,replace=T))

library(tidyr)
library(dplyr)

df_tidy <- gather(df, t_var, t_val, -age, -value)
 sample_n(df_tidy, 3)
#      age value t_var t_val
#  104   2     6    t2     A
#  48    2     9    t1     C
#  66    3     7    t1     A

df_tidy %>%
  group_by(age, t_var, t_val) %>%
  summarise(val_sum = sum(value)) %>%
  spread(t_var, val_sum) %>%
  mutate(diff = t1 - t2)

#      age t_val    t1    t2  diff
#    (int) (chr) (int) (int) (int)
# 1      1     A    30    22     8
# 2      1     B    32    32     0
# 3      1     C    27    28    -1
# 4      1     D    38    39    -1
# 5      1     E    30    36    -6
# 6      2     A    36    35     1
# 7      2     B    26    30    -4
# 8      2     C    40    27    13
# 9      2     D    27    31    -4
# 10     2     E    28    34    -6
# 11     3     A    26    39   -13
# 12     3     B    19    26    -7
# 13     3     C    31    29     2
# 14     3     D    41    33     8
# 15     3     E    39    29    10

Answer 2

这是一个涉及聚合和合并的基础 R 解决方案：

# aggregate by age  and t1 or t2
t1Agg <- aggregate(value ~ t1 + age, data=df, FUN=sum)
t2Agg <- aggregate(value ~ t2 + age, data=df, FUN=sum)

# merge aggregated data
aggData <- merge(t1Agg, t2Agg, by.x=c("age","t1"), by.y=c("age","t2"))
names(aggData) <- c("age", "t", "value.t1", "value.t2")

aggData$diff <- aggData$value.t1 - aggData$value.t2

查找每个多个分组的数据框列差异

Find data frame column differences per multiple groupings

aggregate

r

plyr

dataframe