如何用新组的总和创建新的观察结果?
How to create new observations with sum of a new group?
我有以下数据框:
gender age population
H 0-4 5
H 5-9 5
H 10-14 10
H 15-19 15
H 20-24 15
H 25-29 10
M 0-4 0
M 5-9 5
M 10-14 5
M 15-19 15
M 20-24 10
M 25-29 15
我需要在以下数据框中重新分组年龄类别:
gender age population
H 0-14 20
H 15-19 15
H 20-29 25
M 0-14 10
M 15-19 15
M 20-29 25
我更喜欢 dplyr,所以如果有办法使用这个包来完成这个,我很感激。
使用字符串拆分 - tidyr::separate()
和 cut()
:
library(dplyr)
library(tidyr)
df1 %>%
separate(age, into = c("age1", "age2"), sep = "-", convert = TRUE ) %>%
mutate(age = cut(age1,
breaks = c(0, 14, 19, 29),
labels = c("0-14", "15-19", "20-29"),
include.lowest = TRUE)) %>%
group_by(gender, age) %>%
summarise(population = sum(population))
# output
# gender age population
# (fctr) (fctr) (int)
# 1 H 0-14 20
# 2 H 15-19 15
# 3 H 20-29 25
# 4 M 0-14 10
# 5 M 15-19 15
# 6 M 20-29 25
data.table
解决方案,其中 dat
是 table:
library(data.table)
dat <- as.data.table(dat)
dat[ , mn := as.numeric(sapply(strsplit(age, "-"), "[[", 1))]
dat[ , age := cut(mn, breaks = c(0, 14, 19, 29),
include.lowest = TRUE,
labels = c("0-14", "15-19", "20-29"))]
dat[ , list(population = sum(population)), by = list(gender, age)]
# gender age population
# 1: H 0-14 20
# 2: H 15-19 15
# 3: H 20-29 25
# 4: M 0-14 10
# 5: M 15-19 15
# 6: M 20-29 25
我有以下数据框:
gender age population
H 0-4 5
H 5-9 5
H 10-14 10
H 15-19 15
H 20-24 15
H 25-29 10
M 0-4 0
M 5-9 5
M 10-14 5
M 15-19 15
M 20-24 10
M 25-29 15
我需要在以下数据框中重新分组年龄类别:
gender age population
H 0-14 20
H 15-19 15
H 20-29 25
M 0-14 10
M 15-19 15
M 20-29 25
我更喜欢 dplyr,所以如果有办法使用这个包来完成这个,我很感激。
使用字符串拆分 - tidyr::separate()
和 cut()
:
library(dplyr)
library(tidyr)
df1 %>%
separate(age, into = c("age1", "age2"), sep = "-", convert = TRUE ) %>%
mutate(age = cut(age1,
breaks = c(0, 14, 19, 29),
labels = c("0-14", "15-19", "20-29"),
include.lowest = TRUE)) %>%
group_by(gender, age) %>%
summarise(population = sum(population))
# output
# gender age population
# (fctr) (fctr) (int)
# 1 H 0-14 20
# 2 H 15-19 15
# 3 H 20-29 25
# 4 M 0-14 10
# 5 M 15-19 15
# 6 M 20-29 25
data.table
解决方案,其中 dat
是 table:
library(data.table)
dat <- as.data.table(dat)
dat[ , mn := as.numeric(sapply(strsplit(age, "-"), "[[", 1))]
dat[ , age := cut(mn, breaks = c(0, 14, 19, 29),
include.lowest = TRUE,
labels = c("0-14", "15-19", "20-29"))]
dat[ , list(population = sum(population)), by = list(gender, age)]
# gender age population
# 1: H 0-14 20
# 2: H 15-19 15
# 3: H 20-29 25
# 4: M 0-14 10
# 5: M 15-19 15
# 6: M 20-29 25