将多行合并为每组一行
Combine multiple rows to one row per group
我觉得我的问题很容易解决,但是,我似乎无法弄清楚。
我想合并属于同一组的多行,以便每组有一行。该行具有某些变量的行总和和其他变量的平均值。在示例中,我只包含了变量 treatment
,我需要其中每个组 episode
.
的行的总和
Df <- data.frame(country = c("A", "A", "A", "A", "A", "B","B", "B", "B"),
year = c("1950", "1951", "1952", "1953", "1954", "1950", "1951", "1952", "1953"),
time1 = c("1950", "1951", "1951", "1953", "1954", "1950", "1951", "1952", "1952"),
time2 = c("1951", "1953", "1953", "1954", "1955", "1951", "1952", "1954", "1954"),
episode = c("1", "2", "2", "3", "4", "1", "2", "3", "3"),
status = c(0, 1, 1, 0, 1, 1, 0, 1, 1),
treatment = c(10, "NA", 20, 5, "NA", "NA", 30, 100, 10))
Df2 <- data.frame(country = c("A", "A", "A", "A", "B", "B", "B"),
time1 = c("1950", "1951", "1953", "1954", "1950", "1951", "1952"),
time2 = c("1951", "1953", "1954", "1955", "1951", "1952", "1954"),
episode = c("1", "2", "3", "4", "1", "2", "3"),
status = c(0, 1, 0, 1, 1, 0, 1),
treatment = c(10, 20, 5, 0, 0, 30, 110))
关于如何解决这个问题有什么想法吗?
这个怎么样?
library(tidyverse)
Df2 %>% filter(!is.na(treatment))%>%group_by(episode) %>% summarise(sumTreatment = sum(treatment))
更新:澄清后见评论:
- 第
treatment
列是 character
。使用 type.convert(as.is=TRUE)
我们将其转换为 integer
.
- 然后我们分组,然后
- 我们使用
summarise
和 sum
library(dplyr)
Df %>%
type.convert(as.is=TRUE) %>%
group_by(country, time1, time2, episode, status) %>%
summarise(treatment = sum(treatment, na.rm = TRUE))
country time1 time2 episode status treatment
<chr> <int> <int> <int> <int> <int>
1 A 1950 1951 1 0 10
2 A 1951 1953 2 1 20
3 A 1953 1954 3 0 5
4 A 1954 1955 4 1 0
5 B 1950 1951 1 1 0
6 B 1951 1952 2 0 30
7 B 1952 1954 3 1 110
第一个回答:
library(dplyr)
Df %>%
type.convert(as.is=TRUE) %>%
group_by(episode) %>%
summarise(sumTreatment=sum(treatment, na.rm = TRUE))
episode sumTreatment
<int> <int>
1 1 10
2 2 50
3 3 115
4 4 0
我觉得我的问题很容易解决,但是,我似乎无法弄清楚。
我想合并属于同一组的多行,以便每组有一行。该行具有某些变量的行总和和其他变量的平均值。在示例中,我只包含了变量 treatment
,我需要其中每个组 episode
.
Df <- data.frame(country = c("A", "A", "A", "A", "A", "B","B", "B", "B"),
year = c("1950", "1951", "1952", "1953", "1954", "1950", "1951", "1952", "1953"),
time1 = c("1950", "1951", "1951", "1953", "1954", "1950", "1951", "1952", "1952"),
time2 = c("1951", "1953", "1953", "1954", "1955", "1951", "1952", "1954", "1954"),
episode = c("1", "2", "2", "3", "4", "1", "2", "3", "3"),
status = c(0, 1, 1, 0, 1, 1, 0, 1, 1),
treatment = c(10, "NA", 20, 5, "NA", "NA", 30, 100, 10))
Df2 <- data.frame(country = c("A", "A", "A", "A", "B", "B", "B"),
time1 = c("1950", "1951", "1953", "1954", "1950", "1951", "1952"),
time2 = c("1951", "1953", "1954", "1955", "1951", "1952", "1954"),
episode = c("1", "2", "3", "4", "1", "2", "3"),
status = c(0, 1, 0, 1, 1, 0, 1),
treatment = c(10, 20, 5, 0, 0, 30, 110))
关于如何解决这个问题有什么想法吗?
这个怎么样?
library(tidyverse)
Df2 %>% filter(!is.na(treatment))%>%group_by(episode) %>% summarise(sumTreatment = sum(treatment))
更新:澄清后见评论:
- 第
treatment
列是character
。使用type.convert(as.is=TRUE)
我们将其转换为integer
. - 然后我们分组,然后
- 我们使用
summarise
和sum
library(dplyr)
Df %>%
type.convert(as.is=TRUE) %>%
group_by(country, time1, time2, episode, status) %>%
summarise(treatment = sum(treatment, na.rm = TRUE))
country time1 time2 episode status treatment
<chr> <int> <int> <int> <int> <int>
1 A 1950 1951 1 0 10
2 A 1951 1953 2 1 20
3 A 1953 1954 3 0 5
4 A 1954 1955 4 1 0
5 B 1950 1951 1 1 0
6 B 1951 1952 2 0 30
7 B 1952 1954 3 1 110
第一个回答:
library(dplyr)
Df %>%
type.convert(as.is=TRUE) %>%
group_by(episode) %>%
summarise(sumTreatment=sum(treatment, na.rm = TRUE))
episode sumTreatment
<int> <int>
1 1 10
2 2 50
3 3 115
4 4 0