考虑到假期，将日期转换为 R 中的虚拟变量

Question

这个 post 与上一个 post 有关，但更复杂。我有资料

   df=structure(list(Data = structure(c(4L, 5L, 6L, 7L, 8L, 9L, 10L, 
1L, 2L, 3L), .Label = c("01.01.2018", "02.01.2018", "03.01.2018", 
"25.12.2017", "26.12.2017", "27.12.2017", "28.12.2017", "29.12.2017", 
"30.12.2017", "31.12.2017"), class = "factor"), Y = 1:10), .Names = c("Data", 
"Y"), class = "data.frame", row.names = c(NA, -10L))

我必须将日期日期转换为虚拟变量。如果天指的是这个日期，则为 1，否则为 0。

Paweł Kozielski-Romaneczko 提供的解决方案帮助了我。

library(dplyr)
library(lubridate)
library(tidyr)


df %>%
  mutate(weekDay = lubridate::dmy(Data) %>% weekdays(),
         value = 1) %>%
  spread(key=weekDay, value=value, fill=0)

但是现在，我必须添加带有假期的列。 IE。日期是不是假期？

我有辅助数据集，其中指示日期是假期吗？

df1=structure(list(Data = structure(1:2, .Label = c("01.01.2018", 
"08.03.2018"), class = "factor"), name = structure(c(2L, 1L), .Label = c("International Women's Day", 
"New Year"), class = "factor")), .Names = c("Data", "name"), class = "data.frame", row.names = c(NA, 
-2L))

所以我需要这个假期作为输出

Data       Y    Mon Tue Wed Thu Fri Sat Sun New Year    International Women's Day
25.12.2017  1   1   0   0   0   0   0   0   0                 0
26.12.2017  2   0   1   0   0   0   0   0   0                 0
27.12.2017  3   0   0   1   0   0   0   0   0                 0
28.12.2017  4   0   0   0   1   0   0   0   0                 0
29.12.2017  5   0   0   0   0   1   0   0   0                 0
30.12.2017  6   0   0   0   0   0   1   0   0                 0
31.12.2017  7   0   0   0   0   0   0   1   0                 0
01.01.2018  8   1   0   0   0   0   0   0   1                 0
02.01.2018  9   0   1   0   0   0   0   0   0                 0
03.01.2018  10  0   0   1   0   0   0   0   0                 0

如何将假期添加为名称取自辅助数据集的虚拟变量？

P.S。如果你认为这个主题一定在我的最后一个post，请告诉我，我会删除它。

Answer 1

使用您的示例，我只是对其进行了扩展。根据您的需要，使用 left_join 或 full_join。我使用了 full_join 所以 "International Women's Day" 显示在结果中。

我使用 as.character 清理名称，因为在您的示例中它是一个因素。如果名称不是一个因素，则不需要 as.character。最后我删除了 No_holidays。

df %>% full_join(df1) %>% 
  mutate(weekDay = lubridate::dmy(Data) %>% weekdays(),
         name = ifelse(is.na(name), "No_Holiday", as.character(name)), 
         holiday = ifelse(is.na(name), 0, 1),
         value = 1) %>%
  spread(key = weekDay, value=value, fill=0) %>% 
  spread(key = name, value = holiday, fill = 0) %>% 
  select(-No_Holiday)

         Data  Y Friday Monday Saturday Sunday Thursday Tuesday Wednesday International Women's Day New Year
1  01.01.2018  8      0      1        0      0        0       0         0                         0        1
2  02.01.2018  9      0      0        0      0        0       1         0                         0        0
3  03.01.2018 10      0      0        0      0        0       0         1                         0        0
4  08.03.2018 NA      0      0        0      0        1       0         0                         1        0
5  25.12.2017  1      0      1        0      0        0       0         0                         0        0
6  26.12.2017  2      0      0        0      0        0       1         0                         0        0
7  27.12.2017  3      0      0        0      0        0       0         1                         0        0
8  28.12.2017  4      0      0        0      0        1       0         0                         0        0
9  29.12.2017  5      1      0        0      0        0       0         0                         0        0
10 30.12.2017  6      0      0        1      0        0       0         0                         0        0
11 31.12.2017  7      0      0        0      1        0       0         0                         0        0

考虑到假期，将日期转换为 R 中的虚拟变量

transform date into dummy variable in R taking into account the holidays

r

dataframe

lubridate

dplyr