将多行合并为每组一行

Combine multiple rows to one row per group

我觉得我的问题很容易解决,但是,我似乎无法弄清楚。

我想合并属于同一组的多行,以便每组有一行。该行具有某些变量的行总和和其他变量的平均值。在示例中,我只包含了变量 treatment,我需要其中每个组 episode.

的行的总和
Df <- data.frame(country = c("A", "A", "A", "A", "A", "B","B", "B", "B"),
                 year = c("1950", "1951", "1952", "1953", "1954", "1950", "1951", "1952", "1953"), 
                 time1 = c("1950", "1951", "1951", "1953", "1954", "1950", "1951", "1952", "1952"), 
                 time2 = c("1951", "1953", "1953", "1954", "1955", "1951", "1952", "1954", "1954"),
                 episode = c("1", "2", "2", "3", "4", "1", "2", "3", "3"),
                 status = c(0, 1, 1, 0, 1, 1, 0, 1, 1),
                 treatment = c(10, "NA", 20, 5, "NA", "NA", 30, 100, 10))

Df2 <- data.frame(country = c("A", "A", "A", "A", "B", "B", "B"),
                   time1 = c("1950", "1951", "1953", "1954", "1950", "1951", "1952"), 
                   time2 = c("1951", "1953", "1954", "1955", "1951", "1952", "1954"),
                   episode = c("1", "2", "3", "4", "1", "2", "3"),
                   status = c(0, 1, 0, 1, 1, 0, 1),
                   treatment = c(10, 20, 5, 0, 0, 30, 110))

关于如何解决这个问题有什么想法吗?

这个怎么样?

library(tidyverse)    
Df2 %>% filter(!is.na(treatment))%>%group_by(episode) %>% summarise(sumTreatment = sum(treatment))

更新:澄清后见评论:

  1. treatment 列是 character。使用 type.convert(as.is=TRUE) 我们将其转换为 integer.
  2. 然后我们分组,然后
  3. 我们使用 summarisesum
library(dplyr)
Df %>% 
  type.convert(as.is=TRUE) %>% 
  group_by(country, time1, time2, episode, status) %>%
  summarise(treatment = sum(treatment, na.rm = TRUE))
  country time1 time2 episode status treatment
  <chr>   <int> <int>   <int>  <int>     <int>
1 A        1950  1951       1      0        10
2 A        1951  1953       2      1        20
3 A        1953  1954       3      0         5
4 A        1954  1955       4      1         0
5 B        1950  1951       1      1         0
6 B        1951  1952       2      0        30
7 B        1952  1954       3      1       110

第一个回答:

library(dplyr)
Df %>% 
  type.convert(as.is=TRUE) %>% 
  group_by(episode) %>% 
  summarise(sumTreatment=sum(treatment, na.rm = TRUE))
  episode sumTreatment
    <int>        <int>
1       1           10
2       2           50
3       3          115
4       4            0