如果缺失值的总数超过 R 中的限制,则输出平均值为 NA
Output mean as NA if total counts of missing values exceeds a limit in R
我试图对属于特定组的值进行平均。可以有任意数量的缺失值对应于该特定组。我想要的是,如果缺失值的数量超过某个限制(例如 3),则输出平均值应该为 NA,否则输出忽略这些 NA 值的值的平均值。
下面是我用示例数据尝试的代码:
df <- structure(list(Year = c(2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017), Week = c(44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45), value = c(11.1, 11.6, 26.8, 35.4, 41.5, 9.8, 59.8, 62.9, NaN, 13, 8.7, NaN, NaN, 1.7, NaN, NaN, 12, 18.5, 28.2, 27.3, 42.5, 29.8, 33.1, 35.2, 23.2, 7.2, 2.1, 2.3, 7.8, 3.4)), row.names = c(NA, 30L), class = "data.frame")
out1 <- df %>% group_by(Year, Week) %>% summarise_each(funs(mean(.))) # or
out2 <- df %>% group_by(Year, Week) %>% summarise_each(funs(mean(., na.rm=T)))
这将有助于计算所有列的平均值,但组中的列除外。
df <- structure(list(Year = c(2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017), Week = c(44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45), value = c(11.1, 11.6, 26.8, 35.4, 41.5, 9.8, 59.8, 62.9, NaN, 13, 8.7, NaN, NaN, 1.7, NaN, NaN, 12, 18.5, 28.2, 27.3, 42.5, 29.8, 33.1, 35.2, 23.2, 7.2, 2.1, 2.3, 7.8, 3.4)), row.names = c(NA, 30L), class = "data.frame")
library(tidyverse)
n <- 3
df %>% group_by(Year, Week) %>% summarise(across(everything(), .fns = list(Mean = ~mean(.x, na.rm =T),
na_vals = ~sum(is.na(.x))),
.names = "{.col}.{.fn}"
), .groups = 'drop') %>%
mutate(across(ends_with('.Mean'), ~ifelse(get(str_replace(cur_column(), '.Mean', '.na_vals'))>= n,
NA, .))) %>%
select(!ends_with('.na_vals')) %>%
rename_with(~str_remove(., '.Mean'), ends_with('.Mean'))
# A tibble: 2 x 3
Year Week value
<dbl> <dbl> <dbl>
1 2017 44 NA
2 2017 45 19.5
由 reprex package (v2.0.0)
于 2021-05-10 创建
你可以这样做:
df %>%
group_by(Year, Week) %>%
summarise(across(everything(),
~NA^(sum(is.na(.)) > 3) * mean(., na.rm = TRUE)), .groups = 'drop')
# A tibble: 2 x 3
Year Week value
<dbl> <dbl> <dbl>
1 2017 44 NA
2 2017 45 19.5
我试图对属于特定组的值进行平均。可以有任意数量的缺失值对应于该特定组。我想要的是,如果缺失值的数量超过某个限制(例如 3),则输出平均值应该为 NA,否则输出忽略这些 NA 值的值的平均值。
下面是我用示例数据尝试的代码:
df <- structure(list(Year = c(2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017), Week = c(44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45), value = c(11.1, 11.6, 26.8, 35.4, 41.5, 9.8, 59.8, 62.9, NaN, 13, 8.7, NaN, NaN, 1.7, NaN, NaN, 12, 18.5, 28.2, 27.3, 42.5, 29.8, 33.1, 35.2, 23.2, 7.2, 2.1, 2.3, 7.8, 3.4)), row.names = c(NA, 30L), class = "data.frame")
out1 <- df %>% group_by(Year, Week) %>% summarise_each(funs(mean(.))) # or
out2 <- df %>% group_by(Year, Week) %>% summarise_each(funs(mean(., na.rm=T)))
这将有助于计算所有列的平均值,但组中的列除外。
df <- structure(list(Year = c(2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017), Week = c(44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45, 45), value = c(11.1, 11.6, 26.8, 35.4, 41.5, 9.8, 59.8, 62.9, NaN, 13, 8.7, NaN, NaN, 1.7, NaN, NaN, 12, 18.5, 28.2, 27.3, 42.5, 29.8, 33.1, 35.2, 23.2, 7.2, 2.1, 2.3, 7.8, 3.4)), row.names = c(NA, 30L), class = "data.frame")
library(tidyverse)
n <- 3
df %>% group_by(Year, Week) %>% summarise(across(everything(), .fns = list(Mean = ~mean(.x, na.rm =T),
na_vals = ~sum(is.na(.x))),
.names = "{.col}.{.fn}"
), .groups = 'drop') %>%
mutate(across(ends_with('.Mean'), ~ifelse(get(str_replace(cur_column(), '.Mean', '.na_vals'))>= n,
NA, .))) %>%
select(!ends_with('.na_vals')) %>%
rename_with(~str_remove(., '.Mean'), ends_with('.Mean'))
# A tibble: 2 x 3
Year Week value
<dbl> <dbl> <dbl>
1 2017 44 NA
2 2017 45 19.5
由 reprex package (v2.0.0)
于 2021-05-10 创建你可以这样做:
df %>%
group_by(Year, Week) %>%
summarise(across(everything(),
~NA^(sum(is.na(.)) > 3) * mean(., na.rm = TRUE)), .groups = 'drop')
# A tibble: 2 x 3
Year Week value
<dbl> <dbl> <dbl>
1 2017 44 NA
2 2017 45 19.5