lubridate - select 每周的第一个非星期一。
lubridate - select first non-Monday of every week.
我有一大堆财务数据,我想通过 select 每周第一个非星期一来过滤它。通常是星期二,但如果星期二是假期,有时也可以是星期三。
这是我的代码,在大多数情况下都有效
XLF <- quantmod::getSymbols("XLF", from = "2000-01-01", auto.assign = FALSE)
library(tibble)
library(lubridate)
library(dplyr)
xlf <- as_tibble(XLF) %>% rownames_to_column(var = "date") %>%
select(date, XLF.Adjusted)
xlf$date <- ymd(xlf$date)
# We create Month, Week number and Days of the week columns
# Then we remove all the Mondays
xlf <- xlf %>% mutate(Year = year(date), Month = month(date),
IsoWeek = isoweek(date), WDay = wday(date)) %>%
filter(WDay != 2)
# Creating another tibble just for ease of comparison
xlf2 <- xlf %>%
group_by(Year, IsoWeek) %>%
filter(row_number() == 1) %>%
ungroup()
也就是说,有些问题我目前还没有解决。
例如,问题是它跳过了周二的“2002-12-31”,因为它被视为 2003 年第一个 ISO 周的一部分。
还有几个类似的问题。
我的问题是,我如何才能在每周的第一个非星期一 select 呆在 tidyverse 中而没有此类问题(即不必使用 xts / zoo class)?
您可以自己创建一个持续增加的周数。也许不是最优雅的解决方案,但对我来说效果很好。
as_tibble(XLF) %>%
rownames_to_column(var = "date")%>%
select(date, XLF.Adjusted)%>%
mutate(date = ymd(date),
Year = year(date),
Month = month(date),
WDay = wday(date),
WDay_label = wday(date, label = T))%>%
# if the weekday number is higher in the line above or
# if the date in the previous line is more than 6 days ago
# the week number should be incremented
mutate(week_increment = (WDay < lag(WDay) | difftime(date, lag(date), unit = 'days') > 6))%>%
# the previous line causes the first element to be NA due to
# the fact that the lag function can't find a line above
# we correct this here by setting the first element to TRUE
mutate(week_increment = ifelse(row_number() == 1,
TRUE,
week_increment))%>%
# we can sum the boolean elements in a cumulative way to get a week number
mutate(week_number = cumsum(week_increment))%>%
filter(WDay != 2)%>%
group_by(Year, week_number) %>%
filter(row_number() == 1)
我有一大堆财务数据,我想通过 select 每周第一个非星期一来过滤它。通常是星期二,但如果星期二是假期,有时也可以是星期三。
这是我的代码,在大多数情况下都有效
XLF <- quantmod::getSymbols("XLF", from = "2000-01-01", auto.assign = FALSE)
library(tibble)
library(lubridate)
library(dplyr)
xlf <- as_tibble(XLF) %>% rownames_to_column(var = "date") %>%
select(date, XLF.Adjusted)
xlf$date <- ymd(xlf$date)
# We create Month, Week number and Days of the week columns
# Then we remove all the Mondays
xlf <- xlf %>% mutate(Year = year(date), Month = month(date),
IsoWeek = isoweek(date), WDay = wday(date)) %>%
filter(WDay != 2)
# Creating another tibble just for ease of comparison
xlf2 <- xlf %>%
group_by(Year, IsoWeek) %>%
filter(row_number() == 1) %>%
ungroup()
也就是说,有些问题我目前还没有解决。
例如,问题是它跳过了周二的“2002-12-31”,因为它被视为 2003 年第一个 ISO 周的一部分。
还有几个类似的问题。
我的问题是,我如何才能在每周的第一个非星期一 select 呆在 tidyverse 中而没有此类问题(即不必使用 xts / zoo class)?
您可以自己创建一个持续增加的周数。也许不是最优雅的解决方案,但对我来说效果很好。
as_tibble(XLF) %>%
rownames_to_column(var = "date")%>%
select(date, XLF.Adjusted)%>%
mutate(date = ymd(date),
Year = year(date),
Month = month(date),
WDay = wday(date),
WDay_label = wday(date, label = T))%>%
# if the weekday number is higher in the line above or
# if the date in the previous line is more than 6 days ago
# the week number should be incremented
mutate(week_increment = (WDay < lag(WDay) | difftime(date, lag(date), unit = 'days') > 6))%>%
# the previous line causes the first element to be NA due to
# the fact that the lag function can't find a line above
# we correct this here by setting the first element to TRUE
mutate(week_increment = ifelse(row_number() == 1,
TRUE,
week_increment))%>%
# we can sum the boolean elements in a cumulative way to get a week number
mutate(week_number = cumsum(week_increment))%>%
filter(WDay != 2)%>%
group_by(Year, week_number) %>%
filter(row_number() == 1)