R 指定一个月中的星期几
R specify what week in a month
我有一些库存 return 每日数据需要转换为每周格式。如您所知,股票交易仅在周一至周五进行,我需要每天累加 return 以获得每周累计 return。
我考虑过使用 lubridate 的周功能,但 lubridate 如何知道一周的开始时间?如何使 lubridate 使用工作日函数识别周,即 "Monday" 到 "Friday" 是一周?
我想过写一个循环,比如:如果数据里有"Monday"到"Friday",那我就调用这个一周。但是对于第二周,我应该用什么让 R 知道我们进入第二周呢?那么当我们到了年底并且我们有 52 周时,如何重置周计数以便我们进入下一年?
这是输出:
dat = structure(list(date = structure(c(4019, 4022, 4023, 4024, 4025,
4026, 4029, 4030, 4031, 4032, 4033, 4036, 4037, 4038, 4039, 4040,
4043, 4044, 4045, 4046, 4047, 4050, 4051, 4052, 4053, 4054, 4057,
4058, 4059, 4060, 4061, 4065, 4066, 4067, 4068, 4071, 4072, 4073,
4074, 4075), class = "Date"), weekday = c("Friday", "Monday",
"Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday",
"Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday",
"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Tuesday",
"Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday",
"Thursday", "Friday"), COMP = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), week = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4,
4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 8,
9, 9), RET = c(-0.005435, 0.040984, -0.015748, -0.021333, 0.002725,
0.01087, 0.024194, -0.002625, 0.013158, 0.033766, 0, -0.007538,
-0.005063, 0, -0.002545, 0.015306, 0.017588, -0.007407, 0.024876,
-0.009709, 0, -0.029412, 0.010101, 0.0075, -0.004963, 0.027431,
-0.002427, 0.007299, -0.009662, -0.004878, 0.014706, -0.004831,
0.004854, -0.009662, -0.021951, -0.014963, 0.005063, -0.005038,
0.010127, 0)), .Names = c("date", "weekday", "COMP", "week",
"RET"), row.names = c(NA, -40L), class = c("data.table", "data.frame"
))
library(data.table)
setDT(dat)
这是公司 1 的每日 return 从 1981-01-02 到 1981-02-27 的两个月的数据。让我们忽略现在计算 return 并首先关注时间。
week 列由 weeks() 函数生成。正如你所见,星期不是我想要的,它从星期三开始到星期三结束。
weekday 由 weekdays() 函数生成。
我想制作例如1981-01-02 作为第 1 周(因为这里只有星期五),1981-01-05 到 1981-01-09 作为第 2 周,反之亦然。
如果要统计自数据集开始以来的星期一...
DT[, wk := {
w = DT[weekday == "Monday"][DT, on=.(date), roll=TRUE, which = TRUE]
if (anyNA(w))
1L + replace(w, is.na(w), 0L)
else
w
}]
工作原理
我们正在将 DT
的每一行滚动连接到 DT
的子集,其中 weekday == "Monday"
滚动到子集中的最新日期(on = .(date), roll = TRUE
) 并确定我们登陆的子集中的行号 (which = TRUE
)。
如果第一天不是星期一,我们将有缺失值(对于第一个星期一之前的所有天),并希望用 1 替换它们并将所有其他行号递增 1。
哦,我想还有
DT[, wk := (first(weekday) != "Monday") + cumsum(weekday == "Monday")]
... 因为逻辑条件 first(weekday) != "Monday"
如果 FALSE 为 0,如果为 TRUE 则为 1。
这里有一个更简单的方法(我想更容易理解)来解决这个问题:
# if its a monday, mark as 1, 2, 3 and so on
dt[weekday == 'Monday', is_week := seq(.N)]
# forward fill the missing values
library(zoo)
dt[, is_week := na.locf(is_week,na.rm = F, fromLast = F)]
dt[is.na(is_week), is_week := 0]
# find weekly average return
dt[, mean(RET), is_week]
is_week V1
1: 0 -0.005435000
2: 1 0.003499600
3: 2 0.013698600
4: 3 0.000032000
5: 4 0.005069600
6: 5 0.002131400
7: 6 -0.002950222
8: 7 -0.000962200
使用 lubridate
您可以使用 isoweek
来定义星期列。
library(lubridate)
df[, wk := isoweek(date)]
哪个给你
# date weekday COMP week RET wk
# 1: 1981-01-02 Friday 1 1 -0.005435 1
# 2: 1981-01-05 Monday 1 1 0.040984 2
# 3: 1981-01-06 Tuesday 1 1 -0.015748 2
# 4: 1981-01-07 Wednesday 1 1 -0.021333 2
# 5: 1981-01-08 Thursday 1 2 0.002725 2
# 6: 1981-01-09 Friday 1 2 0.010870 2
# 7: 1981-01-12 Monday 1 2 0.024194 3
# 8: 1981-01-13 Tuesday 1 2 -0.002625 3
# 9: 1981-01-14 Wednesday 1 2 0.013158 3
# 10: 1981-01-15 Thursday 1 3 0.033766 3
# 11: 1981-01-16 Friday 1 3 0.000000 3
# 12: 1981-01-19 Monday 1 3 -0.007538 4
# 13: 1981-01-20 Tuesday 1 3 -0.005063 4
# 14: 1981-01-21 Wednesday 1 3 0.000000 4
# 15: 1981-01-22 Thursday 1 4 -0.002545 4
# 16: 1981-01-23 Friday 1 4 0.015306 4
# 17: 1981-01-26 Monday 1 4 0.017588 5
# 18: 1981-01-27 Tuesday 1 4 -0.007407 5
# 19: 1981-01-28 Wednesday 1 4 0.024876 5
# 20: 1981-01-29 Thursday 1 5 -0.009709 5
使用 dplyr,您可以添加周列
library(dplyr)
df %>%
mutate(wk = isoweek(date))
dat[, wk := .GRP, cut(date, 'week')]
head(dat, 20)
# date weekday COMP week RET wk
# 1: 1981-01-02 Friday 1 1 -0.005435 1
# 2: 1981-01-05 Monday 1 1 0.040984 2
# 3: 1981-01-06 Tuesday 1 1 -0.015748 2
# 4: 1981-01-07 Wednesday 1 1 -0.021333 2
# 5: 1981-01-08 Thursday 1 2 0.002725 2
# 6: 1981-01-09 Friday 1 2 0.010870 2
# 7: 1981-01-12 Monday 1 2 0.024194 3
# 8: 1981-01-13 Tuesday 1 2 -0.002625 3
# 9: 1981-01-14 Wednesday 1 2 0.013158 3
# 10: 1981-01-15 Thursday 1 3 0.033766 3
# 11: 1981-01-16 Friday 1 3 0.000000 3
# 12: 1981-01-19 Monday 1 3 -0.007538 4
# 13: 1981-01-20 Tuesday 1 3 -0.005063 4
# 14: 1981-01-21 Wednesday 1 3 0.000000 4
# 15: 1981-01-22 Thursday 1 4 -0.002545 4
# 16: 1981-01-23 Friday 1 4 0.015306 4
# 17: 1981-01-26 Monday 1 4 0.017588 5
# 18: 1981-01-27 Tuesday 1 4 -0.007407 5
# 19: 1981-01-28 Wednesday 1 4 0.024876 5
# 20: 1981-01-29 Thursday 1 5 -0.009709 5
注意:这与 dt[, wk := lubridate::isoweek(date)]
的结果相同,除非数据未按日期排序。在那种情况下,我的解决方案以相同的方式对周进行分组,但 wk
不会按升序排列。第一周可能会得到 6
,等等
我有一些库存 return 每日数据需要转换为每周格式。如您所知,股票交易仅在周一至周五进行,我需要每天累加 return 以获得每周累计 return。
我考虑过使用 lubridate 的周功能,但 lubridate 如何知道一周的开始时间?如何使 lubridate 使用工作日函数识别周,即 "Monday" 到 "Friday" 是一周?
我想过写一个循环,比如:如果数据里有"Monday"到"Friday",那我就调用这个一周。但是对于第二周,我应该用什么让 R 知道我们进入第二周呢?那么当我们到了年底并且我们有 52 周时,如何重置周计数以便我们进入下一年?
这是输出:
dat = structure(list(date = structure(c(4019, 4022, 4023, 4024, 4025,
4026, 4029, 4030, 4031, 4032, 4033, 4036, 4037, 4038, 4039, 4040,
4043, 4044, 4045, 4046, 4047, 4050, 4051, 4052, 4053, 4054, 4057,
4058, 4059, 4060, 4061, 4065, 4066, 4067, 4068, 4071, 4072, 4073,
4074, 4075), class = "Date"), weekday = c("Friday", "Monday",
"Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday",
"Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday",
"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Tuesday",
"Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday",
"Thursday", "Friday"), COMP = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), week = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4,
4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 8,
9, 9), RET = c(-0.005435, 0.040984, -0.015748, -0.021333, 0.002725,
0.01087, 0.024194, -0.002625, 0.013158, 0.033766, 0, -0.007538,
-0.005063, 0, -0.002545, 0.015306, 0.017588, -0.007407, 0.024876,
-0.009709, 0, -0.029412, 0.010101, 0.0075, -0.004963, 0.027431,
-0.002427, 0.007299, -0.009662, -0.004878, 0.014706, -0.004831,
0.004854, -0.009662, -0.021951, -0.014963, 0.005063, -0.005038,
0.010127, 0)), .Names = c("date", "weekday", "COMP", "week",
"RET"), row.names = c(NA, -40L), class = c("data.table", "data.frame"
))
library(data.table)
setDT(dat)
这是公司 1 的每日 return 从 1981-01-02 到 1981-02-27 的两个月的数据。让我们忽略现在计算 return 并首先关注时间。
week 列由 weeks() 函数生成。正如你所见,星期不是我想要的,它从星期三开始到星期三结束。
weekday 由 weekdays() 函数生成。
我想制作例如1981-01-02 作为第 1 周(因为这里只有星期五),1981-01-05 到 1981-01-09 作为第 2 周,反之亦然。
如果要统计自数据集开始以来的星期一...
DT[, wk := {
w = DT[weekday == "Monday"][DT, on=.(date), roll=TRUE, which = TRUE]
if (anyNA(w))
1L + replace(w, is.na(w), 0L)
else
w
}]
工作原理
我们正在将 DT
的每一行滚动连接到 DT
的子集,其中 weekday == "Monday"
滚动到子集中的最新日期(on = .(date), roll = TRUE
) 并确定我们登陆的子集中的行号 (which = TRUE
)。
如果第一天不是星期一,我们将有缺失值(对于第一个星期一之前的所有天),并希望用 1 替换它们并将所有其他行号递增 1。
哦,我想还有
DT[, wk := (first(weekday) != "Monday") + cumsum(weekday == "Monday")]
... 因为逻辑条件 first(weekday) != "Monday"
如果 FALSE 为 0,如果为 TRUE 则为 1。
这里有一个更简单的方法(我想更容易理解)来解决这个问题:
# if its a monday, mark as 1, 2, 3 and so on
dt[weekday == 'Monday', is_week := seq(.N)]
# forward fill the missing values
library(zoo)
dt[, is_week := na.locf(is_week,na.rm = F, fromLast = F)]
dt[is.na(is_week), is_week := 0]
# find weekly average return
dt[, mean(RET), is_week]
is_week V1
1: 0 -0.005435000
2: 1 0.003499600
3: 2 0.013698600
4: 3 0.000032000
5: 4 0.005069600
6: 5 0.002131400
7: 6 -0.002950222
8: 7 -0.000962200
使用 lubridate
您可以使用 isoweek
来定义星期列。
library(lubridate)
df[, wk := isoweek(date)]
哪个给你
# date weekday COMP week RET wk
# 1: 1981-01-02 Friday 1 1 -0.005435 1
# 2: 1981-01-05 Monday 1 1 0.040984 2
# 3: 1981-01-06 Tuesday 1 1 -0.015748 2
# 4: 1981-01-07 Wednesday 1 1 -0.021333 2
# 5: 1981-01-08 Thursday 1 2 0.002725 2
# 6: 1981-01-09 Friday 1 2 0.010870 2
# 7: 1981-01-12 Monday 1 2 0.024194 3
# 8: 1981-01-13 Tuesday 1 2 -0.002625 3
# 9: 1981-01-14 Wednesday 1 2 0.013158 3
# 10: 1981-01-15 Thursday 1 3 0.033766 3
# 11: 1981-01-16 Friday 1 3 0.000000 3
# 12: 1981-01-19 Monday 1 3 -0.007538 4
# 13: 1981-01-20 Tuesday 1 3 -0.005063 4
# 14: 1981-01-21 Wednesday 1 3 0.000000 4
# 15: 1981-01-22 Thursday 1 4 -0.002545 4
# 16: 1981-01-23 Friday 1 4 0.015306 4
# 17: 1981-01-26 Monday 1 4 0.017588 5
# 18: 1981-01-27 Tuesday 1 4 -0.007407 5
# 19: 1981-01-28 Wednesday 1 4 0.024876 5
# 20: 1981-01-29 Thursday 1 5 -0.009709 5
使用 dplyr,您可以添加周列
library(dplyr)
df %>%
mutate(wk = isoweek(date))
dat[, wk := .GRP, cut(date, 'week')]
head(dat, 20)
# date weekday COMP week RET wk
# 1: 1981-01-02 Friday 1 1 -0.005435 1
# 2: 1981-01-05 Monday 1 1 0.040984 2
# 3: 1981-01-06 Tuesday 1 1 -0.015748 2
# 4: 1981-01-07 Wednesday 1 1 -0.021333 2
# 5: 1981-01-08 Thursday 1 2 0.002725 2
# 6: 1981-01-09 Friday 1 2 0.010870 2
# 7: 1981-01-12 Monday 1 2 0.024194 3
# 8: 1981-01-13 Tuesday 1 2 -0.002625 3
# 9: 1981-01-14 Wednesday 1 2 0.013158 3
# 10: 1981-01-15 Thursday 1 3 0.033766 3
# 11: 1981-01-16 Friday 1 3 0.000000 3
# 12: 1981-01-19 Monday 1 3 -0.007538 4
# 13: 1981-01-20 Tuesday 1 3 -0.005063 4
# 14: 1981-01-21 Wednesday 1 3 0.000000 4
# 15: 1981-01-22 Thursday 1 4 -0.002545 4
# 16: 1981-01-23 Friday 1 4 0.015306 4
# 17: 1981-01-26 Monday 1 4 0.017588 5
# 18: 1981-01-27 Tuesday 1 4 -0.007407 5
# 19: 1981-01-28 Wednesday 1 4 0.024876 5
# 20: 1981-01-29 Thursday 1 5 -0.009709 5
注意:这与 dt[, wk := lubridate::isoweek(date)]
的结果相同,除非数据未按日期排序。在那种情况下,我的解决方案以相同的方式对周进行分组,但 wk
不会按升序排列。第一周可能会得到 6
,等等