如何创建新行以确保时间序列长度相等？

Question

我正在尝试对治疗的有效性进行分类。每个 id 应包含 4 个时间范围。

Dataframe

id	timeframe	distance
1	1	1.1
1	2	1.1
1	3	1.2
1	4	1.1
2	1	1.1
2	2	1.1
2	4	1.1

问题是例如 id 2 timeframe #3 丢失了。如何使用具有此类问题的所有行的平均距离值创建在缺失时间范围内添加的新行？

我在运行时得到 'not all time is the same length' - 使用“纵向 k 均值 (KML)”的纵向聚类

Answer 1

我们可以使用 complete 创建缺失的组合，然后将 NA 替换为 mean

library(dplyr)
library(tidyr)
df1 %>%
    mutate(rn = row_number()) %>%
    complete(id, timeframe) %>%
    mutate(distance = replace(distance, is.na(distance) & is.na(rn), 
          mean(distance, na.rm = TRUE)))

如果mean应该在每个'id'中计算，那么在mutate

之前做一个group_by

df1 %>%
    mutate(rn = row_number()) %>%
    complete(id, timeframe) %>%
    group_by(id) %>%
    mutate(distance = replace(distance, is.na(distance) & is.na(rn), 
          mean(distance, na.rm = TRUE))) %>%
    ungroup

如何创建新行以确保时间序列长度相等？

How to create new row to ensure time series length is equal?

r

time-series