获取数据框R的所有列的平均值
Get mean of all columns of a dataframe R
我有一个包含多列的数据框。每列代表一年中的一天(我有 365 列),每行是特定城市的平均温度。我想得到所有列的平均值,所以我得到了全年的温度平均值。我还想获得每个月的平均值(即 01(一月)、02(二月)等的平均值),并获得一年中每个季度的平均值。
我的数据看起来像这样
data <- data.frame(City = c("London", "Stockholm", "Paris", "Prag", "Berlin", "Copenhagen"),
20100101 = c(4, 5, 3, 4, 6, 7), 20100102 = c(2, 5, 8, 6, 1, 3),
20100205 = c(4, 7, 6, 1, 3, 4), 20100305 = c(0, 3, 7, 9, 3, 2),
20100525 = c(9, 8, 7, 6, 5, 4), 20100719 = c(9, 10, 5, 6, 7, 8),
20101011 = c(15, 3, 5, 7, 8, 9), 20101112 = c(3, 7, 1, 1, 1, 1),
20101212 = c(0, 0, 0, 5, 2, 1))
如何提取年、月、季度的平均值?
如果你得到长格式的数据,处理起来会容易得多。
library(dplyr)
long_data <- data %>%
tidyr::pivot_longer(cols = -City) %>%
mutate(name = as.Date(name, '%Y%m%d'))
一旦你有了它,你就可以很容易地得到每个城市的年、季度、月平均温度。
long_data %>%
group_by(City) %>%
summarise(year_mean = mean(value,na.rm = TRUE))
月均值:
long_data %>%
group_by(City, month = lubridate::month(name)) %>%
#For quarter
#group_by(City, quarter = quarter(name)) %>%
summarise(month_mean = mean(value,na.rm = TRUE))
我们可以在 base R
中用 rowMeans
和 split.default
做到这一点
# // convert the date columns to `Date` class
dates <- as.Date(names(data)[-1], "%Y%m%d")
# // get the row wise mean of numeric columns (except the first column)
city_means <- rowMeans(data[-1])
names(city_means) <- data$City
# // split the data into list of data.frame based on the month
# // loop over the list with sapply and get the rowMeans
month_means <- sapply(split.default(data[-1], format(dates, "%b")),
rowMeans, na.rm = TRUE)
row.names(month_means) <- data$City
# // split by year quarters and get the rowMeans for each list element
quarter_means <- sapply(split.default(data[-1], paste(format(dates, "%Y"),
quarters(dates))), rowMeans, na.rm = TRUE)
row.names(quarter_means) <- data$City
数据
data <- structure(list(City = c("London", "Stockholm", "Paris", "Prag",
"Berlin", "Copenhagen"), `20100101` = c(4, 5, 3, 4, 6, 7), `20100102` = c(2,
5, 8, 6, 1, 3), `20100205` = c(4, 7, 6, 1, 3, 4), `20100305` = c(0,
3, 7, 9, 3, 2), `20100525` = c(9, 8, 7, 6, 5, 4), `20100719` = c(9,
10, 5, 6, 7, 8), `20101011` = c(15, 3, 5, 7, 8, 9), `20101112` = c(3,
7, 1, 1, 1, 1), `20101212` = c(0, 0, 0, 5, 2, 1)),
class = "data.frame", row.names = c(NA,
-6L))
我有一个包含多列的数据框。每列代表一年中的一天(我有 365 列),每行是特定城市的平均温度。我想得到所有列的平均值,所以我得到了全年的温度平均值。我还想获得每个月的平均值(即 01(一月)、02(二月)等的平均值),并获得一年中每个季度的平均值。
我的数据看起来像这样
data <- data.frame(City = c("London", "Stockholm", "Paris", "Prag", "Berlin", "Copenhagen"),
20100101 = c(4, 5, 3, 4, 6, 7), 20100102 = c(2, 5, 8, 6, 1, 3),
20100205 = c(4, 7, 6, 1, 3, 4), 20100305 = c(0, 3, 7, 9, 3, 2),
20100525 = c(9, 8, 7, 6, 5, 4), 20100719 = c(9, 10, 5, 6, 7, 8),
20101011 = c(15, 3, 5, 7, 8, 9), 20101112 = c(3, 7, 1, 1, 1, 1),
20101212 = c(0, 0, 0, 5, 2, 1))
如何提取年、月、季度的平均值?
如果你得到长格式的数据,处理起来会容易得多。
library(dplyr)
long_data <- data %>%
tidyr::pivot_longer(cols = -City) %>%
mutate(name = as.Date(name, '%Y%m%d'))
一旦你有了它,你就可以很容易地得到每个城市的年、季度、月平均温度。
long_data %>%
group_by(City) %>%
summarise(year_mean = mean(value,na.rm = TRUE))
月均值:
long_data %>%
group_by(City, month = lubridate::month(name)) %>%
#For quarter
#group_by(City, quarter = quarter(name)) %>%
summarise(month_mean = mean(value,na.rm = TRUE))
我们可以在 base R
中用 rowMeans
和 split.default
# // convert the date columns to `Date` class
dates <- as.Date(names(data)[-1], "%Y%m%d")
# // get the row wise mean of numeric columns (except the first column)
city_means <- rowMeans(data[-1])
names(city_means) <- data$City
# // split the data into list of data.frame based on the month
# // loop over the list with sapply and get the rowMeans
month_means <- sapply(split.default(data[-1], format(dates, "%b")),
rowMeans, na.rm = TRUE)
row.names(month_means) <- data$City
# // split by year quarters and get the rowMeans for each list element
quarter_means <- sapply(split.default(data[-1], paste(format(dates, "%Y"),
quarters(dates))), rowMeans, na.rm = TRUE)
row.names(quarter_means) <- data$City
数据
data <- structure(list(City = c("London", "Stockholm", "Paris", "Prag",
"Berlin", "Copenhagen"), `20100101` = c(4, 5, 3, 4, 6, 7), `20100102` = c(2,
5, 8, 6, 1, 3), `20100205` = c(4, 7, 6, 1, 3, 4), `20100305` = c(0,
3, 7, 9, 3, 2), `20100525` = c(9, 8, 7, 6, 5, 4), `20100719` = c(9,
10, 5, 6, 7, 8), `20101011` = c(15, 3, 5, 7, 8, 9), `20101112` = c(3,
7, 1, 1, 1, 1), `20101212` = c(0, 0, 0, 5, 2, 1)),
class = "data.frame", row.names = c(NA,
-6L))