基于id为R中的不同时间戳聚合多个变量

Question

我在计算数据中 2 个变量的平均值时遇到问题。我收集了温度和速度，如下所示。我想将数据最小化，并且只有 hourly 读数（没有分钟）。我想根据 time 和 day 为每个 ID 取 temperature 和 speed 的平均值。

ID    temp  Speed   Day    Hour    Minute    Latitude    Longitude
1      3      20    1      11      10      38.9294865  -77.2479055
1      5      25    1      11      30      38.9294865  -77.2479055
1      5      30    1      12      12      38.9294865  -77.2479055
1      6      20    1      12      40      38.9294865  -77.2479055
2      1      40    2      11      05      38.9294771  -77.2478712
2      5      30    2      11      50      38.9294771  -77.2478712
2      2      20    2      12      30      38.9294771  -77.2478712
2      8      10    2      12      40      38.9294771  -77.2478712

我想要的数据如下所示：

ID    temp  Speed   Day    Hour    Minute    Latitude    Longitude
1      4      22.5   1      11      00      38.9294865  -77.2479055
1      5.5    25     1      12      00      38.9294865  -77.2479055
2      3      30     2      11      00      38.9294771  -77.2478712
2      5      15     2      12      00      38.9294771  -77.2478712

我想像这样创建一个包含小时和分钟的列：

Data$HM <- as.date(with(Data, paste(Hour, Minute ,sep=":")), "%H:%M")

然后根据新专栏，我想试试这个代码：

AvrgData<- aggregate(Data[, 2:3], list(Data$HM), mean)

但是我的代码不正确。任何建议将不胜感激！我浏览了这些链接，但仍然没有得到我想要的结果。

How to average columns based on ID in R?

Merge three different columns into a date in R

谢谢

Answer 1

使用 aggregate，您可以获得 temp 和 Speed 中的 mean，按 ID、Day、Hour 分组, Latitude 和 Longitude 并为 Minute 创建一个值为 0.

的列

transform(aggregate(cbind(temp, Speed)~ID + Day + Hour + Latitude + Longitude, 
          df, mean), Minute = 0)

#  ID Day Hour Latitude Longitude temp Speed Minute
#1  1   1   11    38.93    -77.25  4.0  22.5      0
#2  1   1   12    38.93    -77.25  5.5  25.0      0
#3  2   2   11    38.93    -77.25  3.0  35.0      0
#4  2   2   12    38.93    -77.25  5.0  15.0      0

dplyr 中实现的相同逻辑是

library(dplyr)

df %>%
  group_by(ID, Day, Hour, Latitude, Longitude) %>%
  summarise_at(vars(temp, Speed), mean) %>%
  mutate(Minute = 0)

数据

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), temp = c(3L, 
5L, 5L, 6L, 1L, 5L, 2L, 8L), Speed = c(20L, 25L, 30L, 20L, 40L, 
30L, 20L, 10L), Day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Hour = c(11L, 
11L, 12L, 12L, 11L, 11L, 12L, 12L), Minute = c(10L, 30L, 12L, 
40L, 5L, 50L, 30L, 40L), Latitude = c(38.9294865, 38.9294865, 
38.9294865, 38.9294865, 38.9294771, 38.9294771, 38.9294771, 38.9294771
), Longitude = c(-77.2479055, -77.2479055, -77.2479055, -77.2479055, 
-77.2478712, -77.2478712, -77.2478712, -77.2478712)), class = "data.frame", 
row.names = c(NA, -8L))

基于id为R中的不同时间戳聚合多个变量

aggregate multiple variables based on id for different timestamp in R

timestamp

r

mean