基于 UNIX 时间戳的周和月的队列

Question

我需要提取 calendarweek+year 和 month+year 来指示我的数据中的队列。

示例数据：

da = data.frame(start_timestamp = c("1453598257", "1434619797","2016-02-23"))
da
  start_timestamp
1      1453598257
2      1434619797
3      1456324104

我想添加以下变量：

startcalendarweek：表示start_timestamp
startmonth：表示start_timestamp
cohort_startweek：表示基于 startcalendarweek 的群组（1 = 2015 年第 1 周，2 = 2015 年第 2 周等）
cohort_startmonth：表示基于开始月份的群组（1 = 2015 年 1 月，2 = 2015 年 2 月等）

输出数据：

da
  start_timestamp startcalendarweek   startmonth cohort_startweek cohort_startmonth
1      1453598257            4_2016  january2016               55                13
2      1434619797           25_2015     june2015               24                 6
3      1456324104            8_2016 february2016               60                14

Answer 1

您可以使用 lubridate 函数尝试以下操作：

library(dplyr)
library(lubridate)
da = data.frame(start_timestamp = c("1453598257", "1434619797","1456324104"))  

da %>%
  mutate(start_timestamp = as_datetime(as.numeric(start_timestamp)), 
         date = as.Date(start_timestamp), 
         startcalendarweek = format(date, '%V_%Y'), 
         startmonth = format(date, '%B%Y'), 
         min_date = floor_date(min(date), 'year'),
         cohort_startweek = as.integer(round(difftime(date, min_date, units = 'week'))), 
         cohort_startmonth = as.integer(round((date - min_date)/30)))

您可以查阅 ?strptime 以了解 format 中每个值的含义。 cohort_startmonth 可能不准确，因为我在这里除以 30 以获得月份差异（一个月中的天数并不总是 30）。

基于 UNIX 时间戳的周和月的队列

Cohort based on week and month from UNIX timestamp

time

timestamp

r

lubridate

dplyr