确定时间序列中的 windows/epochs 并计算平均值

Question

我正在处理来自神经生理学记录的时间序列数据，这些记录通常 'markers' 标志着事件的开始（例如，屏幕上呈现的刺激）。我正在尝试根据某些标记对特定 windows/epochs 进行子集化，然后对这些单独的 windows/epochs 进行平均。

为了说明这一点，下面是一个非常简单的例子（我的实际数据集有数百万个数据点，所以最好有有效的解决方案）。

df <- data.frame(value = c(1:10, 101:110), #time series data
                 marker = c(NA, NA, 'start', NA, NA, NA, NA, 'end', NA, NA,  #event markers
                            NA, NA, 'start', NA, NA, NA, NA, 'end', NA, NA))

start <- which(df$marker == "start") #indices 3 and 13 are the 'start' markers
end <- which(df$marker == 'end') #indices 8 and 18 are the 'end markers'

window1 <- df$value[start[1]:end[1]] #first window (indices 3 to 8)
window2 <- df$value[start[2]:end[2]] #second window (indices 13 to 18)

averageWindow <- (window1 + window2) / 2 #average of the two windows

这是最有效的方法吗（我的实际数据中有将近 1000 windows 和大约 100 万行）？

Answer 1

我不确定您是想要基于所有 windows 的平均值还是每个 window 的平均值。所以我决定产生这两个结果。使用你的 start 和 end，我用 lapply() 对数据进行了子集化。到这个时候，我删除了不相关的数据。然后，我将列表中的数据框与 rbindlist() 组合在一起，并将 ID 分配给一个新列。最后的过程是取平均值。

library(data.table)

start <- which(df$marker == "start") #indices 3 and 13 are the 'start' markers
end <- which(df$marker == 'end') #indices 8 and 18 are the 'end markers'

rbindlist(lapply(1:length(start), function(x){
          df[start[x]:end[x], ]}), idcol = TRUE) -> temp

# An overall average
temp[, list(average = sum(value) / uniqueN(.id))]

#   average
#1:     333

# An average for each window
temp[, list(average = sum(value) / .N), by = .id]

#   .id average
#1:   1     5.5
#2:   2   105.5

回复OP的消息，我想出了以下代码。我为这 6 个点中的每一个创建了 ID，并计算了每个点的平均值。

temp[, index := 1:.N, by = .id][,
    list(average = sum(value) / .N), by = index]

#   index average
#1:     1      53
#2:     2      54
#3:     3      55
#4:     4      56
#5:     5      57
#6:     6      58

数据

df <- data.frame(value = c(1:10, 101:110), #time series data
                 marker = c(NA, NA, 'start', NA, NA, NA, NA, 'end', NA, NA,  #event markers
                            NA, NA, 'start', NA, NA, NA, NA, 'end', NA, NA),
                 stringsAsFactors = FALSE)

确定时间序列中的 windows/epochs 并计算平均值

Determine windows/epochs in time series and calculate average

average

r

time-series

subset