如何计算数据框中的均值?

How to calculate mean in a dataframe?

下面是这个数据框的例子:

key_date Particles         PM    timestamp                date airport ws wd tempi humidity
1  2017-04-25 0.0000000   0.000000 1.493132e+12 2017-04-25 15:45:53    <NA> NA NA    NA       NA
2  2017-04-25 0.0000000   0.000000 1.493132e+12 2017-04-25 15:46:23    <NA> NA NA    NA       NA
3  2017-04-25 0.0000000   0.000000 1.493132e+12 2017-04-25 15:46:53    <NA> NA NA    NA       NA
4  2017-04-25 1.5333300  91.269643 1.493132e+12 2017-04-25 15:47:23    <NA> NA NA    NA       NA
5  2017-04-25 1.7733300 105.555357 1.493132e+12 2017-04-25 15:47:53    <NA> NA NA    NA       NA
6  2017-04-25 0.0000000   0.000000 1.493132e+12 2017-04-25 15:48:23    <NA> NA NA    NA       NA
7  2017-04-25 0.4100000  24.404762 1.493132e+12 2017-04-25 15:48:53    <NA> NA NA    NA       NA
8  2017-04-25 0.0933333   5.555554 1.493132e+12 2017-04-25 15:49:23    <NA> NA NA    NA       NA
9  2017-04-25 0.2166670  12.896845 1.493132e+12 2017-04-25 15:49:53    <NA> NA NA    NA       NA
10 2017-04-25 0.0000000   0.000000 1.493132e+12 2017-04-25 15:50:23    <NA> NA NA    NA       NA

通常我通过openairmean应用于我的情节,例如:

timePlot(mergedDf, pollutant = c("Particles"), group = TRUE, avg.time = "1 min")

但是我怎样才能将 mean 应用到我的 mergedDf 级别,而不是使用 openair

我试过了:

mergedDf <- mergedDf[,list(avg=mean(Particles)),by='1 min']

我收到这个错误:

Error in [.data.frame(mergedDf, , list(avg = mean(Particles)), by = "1 min") : unused argument (by = "1 min")

有什么想法我应该如何正确地做到这一点?

编辑:

示例数据:

> dput(mergedDf[1:20, ])
structure(list(key_date = c("2017-04-25", "2017-04-25", "2017-04-25", 
"2017-04-25", "2017-04-25", "2017-04-25", "2017-04-25", "2017-04-25", 
"2017-04-25", "2017-04-25", "2017-04-25", "2017-04-25", "2017-04-25", 
"2017-04-25", "2017-04-25", "2017-04-25", "2017-04-25", "2017-04-25", 
"2017-04-25", "2017-04-25"), Particles = c(0, 0, 0, 1.53333, 
1.77333, 0, 0.41, 0.0933333, 0.216667, 0, 0, 0, 0.126667, 0.226667, 
0.103333, 0.26, 0.206667, 0.473333, 0, 0), PM = c(0, 0, 0, 91.2696428571429, 
105.555357142857, 0, 24.4047619047619, 5.55555357142857, 12.8968452380952, 
0, 0, 0, 7.53970238095238, 13.4920833333333, 6.15077380952381, 
15.4761904761905, 12.3016071428571, 28.1745833333333, 0, 0), 
    timestamp = c(1493131553332, 1493131583376, 1493131613410, 
    1493131643467, 1493131673527, 1493131703573, 1493131733617, 
    1493131763676, 1493131793730, 1493131823777, 1493131853791, 
    1493131883866, 1493131913922, 1493131943948, 1493131973986, 
    1493132004055, 1493132034084, 1493132064145, 1493132094211, 
    1493132124236), date = structure(c(1493131553.332, 1493131583.376, 
    1493131613.41, 1493131643.467, 1493131673.527, 1493131703.573, 
    1493131733.617, 1493131763.676, 1493131793.73, 1493131823.777, 
    1493131853.791, 1493131883.866, 1493131913.922, 1493131943.948, 
    1493131973.986, 1493132004.055, 1493132034.084, 1493132064.145, 
    1493132094.211, 1493132124.236), class = c("POSIXct", "POSIXt"
    ), tzone = "UTC-1"), airport = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_), ws = c(NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
    ), wd = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_), tempi = c(NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
    ), humidity = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_, NA_real_)), .Names = c("key_date", "Particles", 
"PM", "timestamp", "date", "airport", "ws", "wd", "tempi", "humidity"
), row.names = c(NA, 20L), class = "data.frame")

您在 mergedDf <- mergedDf[,list(avg=mean(Particles)),by='1 min'] 中有一个错误:它应该类似于 list(avg=mean(Particles),by='1 min')。但是,正如错误消息所述:没有参数 by.

我看了一下,我认为 base-R 中没有移动平均线。我找到的是以下解决方案(您没有指定输出 - 因此我只能猜测):

df <- dput(...)
df$mean <- zoo::rollapply(df$Particles, width=60, mean, fill=NA, align="right")

注:

  • 您可以将 POSIXct 转换为 numeric(相对于原点的秒数)。那么1分钟就是60秒
  • 我担心你的原始版本更简单...

希望对你有所帮助