按组计算最小值和最大值(范围)
Calculate min and max (range) by group
我在数据框中有这样的东西:
PersonId Date_Withdrawal
A 2012-05-01
A 2012-06-01
B 2012-05-01
C 2012-05-01
A 2012-07-01
A 2012-10-01
B 2012-08-01
B 2012-12-01
C 2012-07-01
我想在 'PersonId'
之前获取最小和最大日期
首先,转换为正确的日期 class(始终是一个好习惯),然后您可以 运行 一个简单的 range
按组。这是一个尝试
library(data.table)
setDT(df)[, Date_Withdrawal := as.IDate(Date_Withdrawal)]
df[, as.list(range(Date_Withdrawal)), by = PersonId]
# PersonId V1 V2
# 1: A 2012-05-01 2012-10-01
# 2: B 2012-05-01 2012-12-01
# 3: C 2012-05-01 2012-07-01
或
library(dplyr)
df %>%
mutate(Date_Withdrawal = as.Date(Date_Withdrawal)) %>%
group_by(PersonId) %>%
summarise(Min = min(Date_Withdrawal), Max = max(Date_Withdrawal))
# Source: local data frame [3 x 3]
#
# PersonId Min Max
# (fctr) (date) (date)
# 1 A 2012-05-01 2012-10-01
# 2 B 2012-05-01 2012-12-01
# 3 C 2012-05-01 2012-07-01
P.S。 base aggregate
看起来像 aggregate(as.Date(Date_Withdrawal) ~ PersonId, df, range)
但它拒绝保留 classes .
我在数据框中有这样的东西:
PersonId Date_Withdrawal
A 2012-05-01
A 2012-06-01
B 2012-05-01
C 2012-05-01
A 2012-07-01
A 2012-10-01
B 2012-08-01
B 2012-12-01
C 2012-07-01
我想在 'PersonId'
之前获取最小和最大日期首先,转换为正确的日期 class(始终是一个好习惯),然后您可以 运行 一个简单的 range
按组。这是一个尝试
library(data.table)
setDT(df)[, Date_Withdrawal := as.IDate(Date_Withdrawal)]
df[, as.list(range(Date_Withdrawal)), by = PersonId]
# PersonId V1 V2
# 1: A 2012-05-01 2012-10-01
# 2: B 2012-05-01 2012-12-01
# 3: C 2012-05-01 2012-07-01
或
library(dplyr)
df %>%
mutate(Date_Withdrawal = as.Date(Date_Withdrawal)) %>%
group_by(PersonId) %>%
summarise(Min = min(Date_Withdrawal), Max = max(Date_Withdrawal))
# Source: local data frame [3 x 3]
#
# PersonId Min Max
# (fctr) (date) (date)
# 1 A 2012-05-01 2012-10-01
# 2 B 2012-05-01 2012-12-01
# 3 C 2012-05-01 2012-07-01
P.S。 base aggregate
看起来像 aggregate(as.Date(Date_Withdrawal) ~ PersonId, df, range)
但它拒绝保留 classes .