循环遍历日期时间的数据框

Question

我正在尝试为用于跟踪我正在研究的鸟类迁徙的卫星发射器创建 GPS 时间表。下面称为 'sched_gps_fixes' 的函数获取日期时间向量并将它们写入 .ASF 文件，该文件将上传到卫星发射器。这会告诉发射器进行 GPS 定位的日期和时间。使用 R 和 sched_gps_fixes 函数可以让我快速创建一个从一年中的任何一天开始的 GPS 时间表。发射器附带的软件也可以做到这一点，但我每次和日期都必须煞费苦心地 select 我想让发射器获取 GPS 位置。

所以我想：1) 创建一个数据框，其中包含 2018 年一年中的每一天，以及我希望发射器收集 GPS 位置的时间，2) 使用数据框的每一行作为一系列日期时间的开始日期（例如，从 2018-03-25 12:00:00 开始，我想创建一个 GPS 时间表，之后每隔一天获取一个 GPS 点，所以 2018-03-25 12:00:00、2018-03-27 12:00:00 等），以及 3) 为每个 GPS 时间表创建一个 .ASF 文件。下面是我要完成的简化版本：

library(lubridate)

# set the beginning time
start_date <- ymd_hms('2018-01-01 12:00:00')

# create a sequence of datetimes starting January 1
days_df <- seq(ymd_hms(start_date), ymd_hms(start_date+days(10)), by='1 days')
tz(days_df) <- "America/Chicago"
days_df <- as.data.frame(days_df)
days_df

# to reproduce the example
days_df <- structure(list(days_df = structure(c(1514829600, 1514916000, 
1515002400, 1515088800, 1515175200, 1515261600, 1515348000, 1515434400, 
1515520800, 1515607200, 1515693600), class = c("POSIXct", "POSIXt"
), tzone = "America/Chicago")), .Names = "days_df", row.names = c(NA, 
-11L), class = "data.frame")

# the data frame looks like this:

days_df
1  2018-01-01 12:00:00
2  2018-01-02 12:00:00
3  2018-01-03 12:00:00
4  2018-01-04 12:00:00
5  2018-01-05 12:00:00
6  2018-01-06 12:00:00
7  2018-01-07 12:00:00
8  2018-01-08 12:00:00
9  2018-01-09 12:00:00
10 2018-01-10 12:00:00
11 2018-01-11 12:00:00

我想遍历数据框中的每个日期时间，并为数据框的每一行创建一个向量。所以每个向量都会有一个特定行的日期时间作为 GPS 计划的开始日期，这将每 2 天取一个点（类似这样）：

[1] "2018-01-01 12:00:00 UTC" "2018-01-03 12:00:00 UTC" "2018-01-05 12:00:00 UTC" "2018-01-07 12:00:00 UTC"
[5] "2018-01-09 12:00:00 UTC" "2018-01-11 12:00:00 UTC"

然后每个矢量（或 GPS 时间表）将在以下函数中运行作为 'gps_schedule' 为发射器创建一个 .ASF 文件：

sched_gps_fixes(gps_schedule, tz = "America/Chicago", out_file = "./gps_fixes")

所以我想知道如何创建一个 for 循环来为 2018 年的每一天生成一个日期时间向量。这是我正在尝试做的伪代码：

# create a loop called 'create_schedules' to make the GPS schedules and produce a .ASF file for each day of 2018

create_schedules <- function(days_df) {

  for(row in 1:nrow(days_df)) {

    seq(ymd_hms(days_df[[i]]), ymd_hms(days_df[[i]]+days(10)), by='2 days')

  }
}

# run the function
create_schedules(days_df)

我想我需要一个输出来存储每个向量并按其开始日期命名，等等？

谢谢，

周杰伦

Answer 1

一种选择是使用 mapply 根据 OP 提供的计划定义为每一行生成计划：

library(lubridate)

# For the sample data max_date needs to be calculated. Otherwise to generate
# schedule for whole 2018 max_date can be taken as 31-Dec-2018.
max_date = max(days_df$days_df)

mapply(function(x)seq(x, max_date, by="2 days"),days_df$days_df) 

#Result : Only first 3 items from the list generated. It will continue 
# [[1]]
# [1] "2018-01-01 12:00:00 CST" "2018-01-03 12:00:00 CST" "2018-01-05 12:00:00 CST"
# [4] "2018-01-07 12:00:00 CST" "2018-01-09 12:00:00 CST" "2018-01-11 12:00:00 CST"
# 
# [[2]]
# [1] "2018-01-02 12:00:00 CST" "2018-01-04 12:00:00 CST" "2018-01-06 12:00:00 CST"
# [4] "2018-01-08 12:00:00 CST" "2018-01-10 12:00:00 CST"
# 
# [[3]]
# [1] "2018-01-03 12:00:00 CST" "2018-01-05 12:00:00 CST" "2018-01-07 12:00:00 CST"
# [4] "2018-01-09 12:00:00 CST" "2018-01-11 12:00:00 CST"
# ....
# ....
# ....
# [[10]]
# [1] "2018-01-10 12:00:00 CST"
# 
# [[11]]
# [1] "2018-01-11 12:00:00 CST"

如果 OP 更喜欢对结果列表中的项目使用 names，那么 mapply 可以用作：

更新： 根据 OP 的要求生成开始 +10 天的时间表。 10 天相当于 10*24*3600 seconds.

mapply(function(x, y)seq(y, y+10*24*3600, by="2 days"),
    as.character(days_df$days_df), days_df$days_df, 
    SIMPLIFY = FALSE,USE.NAMES = TRUE) 

#Result
# $`2018-01-01 12:00:00`
# [1] "2018-01-01 12:00:00 CST" "2018-01-03 12:00:00 CST" "2018-01-05 12:00:00 CST"
# [4] "2018-01-07 12:00:00 CST" "2018-01-09 12:00:00 CST" "2018-01-11 12:00:00 CST"
#.......
#.......
#.......so on

循环遍历日期时间的数据框

Looping through a data frame of datetimes

for-loop

r

lubridate

tidyverse