使用标准计算天数
Calculating number of days using criterion
概览
假设: 假设我是一名狂热的运动员。我有一个数据集,每次我去 boating/skiing/whatever 都会记录。我想计算自上次在每个给定区域发生的假设事故以来经过的天数。
我的数据
这是我的一小部分可复制数据:
mydata <- data.frame(state = c(rep("Vermont", 5), rep("New Hampshire", 5)),
date = c("2016-01-01", "2016-01-03", "2016-01-04", "2016-01-04", "2016-02-01",
"2016-01-03", "2016-01-15", "2016-01-16", "2016-02-01", "2016-02-03"),
accident = c(1, 0, 0, 1, 1,
0, 1, 1, 0, 1))
这是它的样子。请注意,日期是“参差不齐”的——我有时 ski/boat 连续两天,有时我会休息一周。另请注意,我在多个州进行这些体育活动,我想 group_by
州,结果我第一次在该地区 ski/boat 获得 NA
值。
state date accident
Vermont 2016-01-01 1
Vermont 2016-01-02 0
Vermont 2016-01-03 0
Vermont 2016-01-04 1
Vermont 2016-02-01 1
New Hampshire 2016-01-03 0
New Hampshire 2016-01-15 1
New Hampshire 2016-01-16 1
New Hampshire 2016-02-01 0
New Hampshire 2016-02-03 1
我想制作这个:
state date accident numdays
Vermont 2016-01-01 1 NA
Vermont 2016-01-02 0 1
Vermont 2016-01-03 0 2
Vermont 2016-01-04 1 3
Vermont 2016-02-01 1 28
New Hampshire 2016-01-03 0 NA
New Hampshire 2016-01-15 1 NA
New Hampshire 2016-01-16 1 1
New Hampshire 2016-02-01 0 16
New Hampshire 2016-02-03 1 18
*已编辑以更正数据中的拼写错误
这是一个选项:
创建一个包含事故发生日期的新列。使用tidyr::fill
将未发生事故的天数填入之前的值。对于每个 state
计算自上次事故发生以来经过的天数。
library(dplyr)
mydata %>%
mutate(date = as.Date(date),
numdays = replace(date, accident == 0, NA)) %>%
group_by(state) %>%
mutate(numdays = lag(numdays)) %>%
tidyr::fill(numdays) %>%
mutate(numdays = as.integer(date - numdays))
# state date accident numdays
# <chr> <date> <dbl> <int>
# 1 Vermont 2016-01-01 1 NA
# 2 Vermont 2016-01-02 0 1
# 3 Vermont 2016-01-03 0 2
# 4 Vermont 2016-01-04 1 3
# 5 Vermont 2016-02-01 1 28
# 6 New Hampshire 2016-01-03 0 NA
# 7 New Hampshire 2016-01-15 1 NA
# 8 New Hampshire 2016-01-16 1 1
# 9 New Hampshire 2016-02-01 0 16
#10 New Hampshire 2016-02-03 1 18
数据
date
个条目中可能有错字,我已在下面更正。
mydata <- data.frame(state = c(rep("Vermont", 5), rep("New Hampshire", 5)),
date = c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04", "2016-02-01",
"2016-01-03", "2016-01-15", "2016-01-16", "2016-02-01", "2016-02-03"),
accident = c(1, 0, 0, 1, 1,
0, 1, 1, 0, 1))
概览
假设: 假设我是一名狂热的运动员。我有一个数据集,每次我去 boating/skiing/whatever 都会记录。我想计算自上次在每个给定区域发生的假设事故以来经过的天数。
我的数据
这是我的一小部分可复制数据:
mydata <- data.frame(state = c(rep("Vermont", 5), rep("New Hampshire", 5)),
date = c("2016-01-01", "2016-01-03", "2016-01-04", "2016-01-04", "2016-02-01",
"2016-01-03", "2016-01-15", "2016-01-16", "2016-02-01", "2016-02-03"),
accident = c(1, 0, 0, 1, 1,
0, 1, 1, 0, 1))
这是它的样子。请注意,日期是“参差不齐”的——我有时 ski/boat 连续两天,有时我会休息一周。另请注意,我在多个州进行这些体育活动,我想 group_by
州,结果我第一次在该地区 ski/boat 获得 NA
值。
state date accident
Vermont 2016-01-01 1
Vermont 2016-01-02 0
Vermont 2016-01-03 0
Vermont 2016-01-04 1
Vermont 2016-02-01 1
New Hampshire 2016-01-03 0
New Hampshire 2016-01-15 1
New Hampshire 2016-01-16 1
New Hampshire 2016-02-01 0
New Hampshire 2016-02-03 1
我想制作这个:
state date accident numdays
Vermont 2016-01-01 1 NA
Vermont 2016-01-02 0 1
Vermont 2016-01-03 0 2
Vermont 2016-01-04 1 3
Vermont 2016-02-01 1 28
New Hampshire 2016-01-03 0 NA
New Hampshire 2016-01-15 1 NA
New Hampshire 2016-01-16 1 1
New Hampshire 2016-02-01 0 16
New Hampshire 2016-02-03 1 18
*已编辑以更正数据中的拼写错误
这是一个选项:
创建一个包含事故发生日期的新列。使用tidyr::fill
将未发生事故的天数填入之前的值。对于每个 state
计算自上次事故发生以来经过的天数。
library(dplyr)
mydata %>%
mutate(date = as.Date(date),
numdays = replace(date, accident == 0, NA)) %>%
group_by(state) %>%
mutate(numdays = lag(numdays)) %>%
tidyr::fill(numdays) %>%
mutate(numdays = as.integer(date - numdays))
# state date accident numdays
# <chr> <date> <dbl> <int>
# 1 Vermont 2016-01-01 1 NA
# 2 Vermont 2016-01-02 0 1
# 3 Vermont 2016-01-03 0 2
# 4 Vermont 2016-01-04 1 3
# 5 Vermont 2016-02-01 1 28
# 6 New Hampshire 2016-01-03 0 NA
# 7 New Hampshire 2016-01-15 1 NA
# 8 New Hampshire 2016-01-16 1 1
# 9 New Hampshire 2016-02-01 0 16
#10 New Hampshire 2016-02-03 1 18
数据
date
个条目中可能有错字,我已在下面更正。
mydata <- data.frame(state = c(rep("Vermont", 5), rep("New Hampshire", 5)),
date = c("2016-01-01", "2016-01-02", "2016-01-03", "2016-01-04", "2016-02-01",
"2016-01-03", "2016-01-15", "2016-01-16", "2016-02-01", "2016-02-03"),
accident = c(1, 0, 0, 1, 1,
0, 1, 1, 0, 1))