在 dplyr group/variable 对上跳过 na_interpolation,在 R 中具有完整的 NA
Skip na_interpolation on dplyr group/variable pairs with full NAs in R
我有一个如下所示的数据框:
Country Year acnt_class wages
3 AZE 2010 NA NA
4 AZE 2011 0.4206776 NA
5 AZE 2012 NA NA
6 AZE 2013 NA NA
7 AZE 2014 0.7735889 0.4273174
8 AZE 2015 NA NA
9 AZE 2016 NA NA
10 AZE 2017 0.5108674 0.4335978
11 AZE 2018 NA NA
15 BDI 2010 NA NA
16 BDI 2011 0.3140646 NA
17 BDI 2012 NA NA
18 BDI 2013 NA NA
19 BDI 2014 0.1224175 NA
20 BDI 2015 NA NA
21 BDI 2016 NA NA
22 BDI 2017 NA NA
23 BDI 2018 NA NA
27 BEL 2010 NA NA
28 BEL 2011 0.9576057 NA
29 BEL 2012 NA NA
30 BEL 2013 NA NA
31 BEL 2014 1.0083120 0.9623492
32 BEL 2015 NA NA
33 BEL 2016 NA NA
34 BEL 2017 1.0036910 0.9499486
35 BEL 2018 NA NA
我正在尝试 运行 此函数使用 stine 插值法跨两个变量列 "acnt_class" 和 "wages":
按组填充缺失的 NA
DF <- DF %>%
group_by(Country) %>%
mutate_at(.vars = c("acnt_class", "wages"),
.funs = ~na_interpolation(., option = "stine"))
只要我 运行 它在每组至少有两个观察值的列上工作,但是,在这里,我 运行 进入这个错误:
Error in na_interpolation(., option = "stine") :
Input data needs at least 2 non-NA data point for applying na_interpolation
由于组 "BDI" 具有变量 "wages" 的完整 NA。
理想情况下,我正在寻找一个修改后的函数,它将 "skip" group/variable 与完整的 NAs/1 观察配对并保持原样。解决方案?谢谢!
找到解决方案:
仅用于插值:
library(TSimpute)
library(dplyr)
library(zoo)
DF <- DF %>%
group_by(Country) %>%
mutate_at(vars(acnt_class, wages), funs(if(sum(!is.na(.))<2) {.} else{replace(na_interpolation(., option = "stine"), is.na(na.approx(., na.rm=FALSE)), NA)}))
TiberiusGracchus2020 提供的答案效果很好。如果它对任何人都有帮助,我已将该代码片段转换为一个带有大量注释的函数,以便更清楚地了解每个阶段发生的事情。
# Modify imputeTS::na_interpolate function
# (1) doesn't break on all NA vectors
# (2) won't impute leading and lagging NAs
na_interpolation2 <- function(x, option = "linear") {
library(TSimpute)
library(dplyr)
total_not_missing <- sum(!is.na(x))
# check there is sufficient data for na_interpolation
if(total_not_missing < 2) {x}
else
# replace takes an input vector, a T/F vector & replacement value
{replace(
# input vector is interpolated data
# this will impute leading/lagging NAs which we don't want
imputeTS::na_interpolation(x, option = option),
# create T/F vector for NAs,
is.na(na.approx(x, na.rm = FALSE)),
# replace TRUE with NA in input vector
NA)
}
}
# example data
data1 <- c(NA, NA, NA, NA, NA)
data2 <- c(NA, NA, 1, NA, 3, NA)
na_interpolation(data1)
# Error in na_interpolation(data1) : Input data needs at
# least 2 non-NA data point for applying na_interpolation
na_interpolation(data2)
# [1] 1 1 1 2 3 3
na_interpolation2(data1)
# [1] NA NA NA NA NA
na_interpolation2(data2)
# [1] NA NA 1 2 3 NA
我有一个如下所示的数据框:
Country Year acnt_class wages
3 AZE 2010 NA NA
4 AZE 2011 0.4206776 NA
5 AZE 2012 NA NA
6 AZE 2013 NA NA
7 AZE 2014 0.7735889 0.4273174
8 AZE 2015 NA NA
9 AZE 2016 NA NA
10 AZE 2017 0.5108674 0.4335978
11 AZE 2018 NA NA
15 BDI 2010 NA NA
16 BDI 2011 0.3140646 NA
17 BDI 2012 NA NA
18 BDI 2013 NA NA
19 BDI 2014 0.1224175 NA
20 BDI 2015 NA NA
21 BDI 2016 NA NA
22 BDI 2017 NA NA
23 BDI 2018 NA NA
27 BEL 2010 NA NA
28 BEL 2011 0.9576057 NA
29 BEL 2012 NA NA
30 BEL 2013 NA NA
31 BEL 2014 1.0083120 0.9623492
32 BEL 2015 NA NA
33 BEL 2016 NA NA
34 BEL 2017 1.0036910 0.9499486
35 BEL 2018 NA NA
我正在尝试 运行 此函数使用 stine 插值法跨两个变量列 "acnt_class" 和 "wages":
按组填充缺失的 NADF <- DF %>%
group_by(Country) %>%
mutate_at(.vars = c("acnt_class", "wages"),
.funs = ~na_interpolation(., option = "stine"))
只要我 运行 它在每组至少有两个观察值的列上工作,但是,在这里,我 运行 进入这个错误:
Error in na_interpolation(., option = "stine") :
Input data needs at least 2 non-NA data point for applying na_interpolation
由于组 "BDI" 具有变量 "wages" 的完整 NA。
理想情况下,我正在寻找一个修改后的函数,它将 "skip" group/variable 与完整的 NAs/1 观察配对并保持原样。解决方案?谢谢!
找到解决方案:
仅用于插值:
library(TSimpute)
library(dplyr)
library(zoo)
DF <- DF %>%
group_by(Country) %>%
mutate_at(vars(acnt_class, wages), funs(if(sum(!is.na(.))<2) {.} else{replace(na_interpolation(., option = "stine"), is.na(na.approx(., na.rm=FALSE)), NA)}))
TiberiusGracchus2020 提供的答案效果很好。如果它对任何人都有帮助,我已将该代码片段转换为一个带有大量注释的函数,以便更清楚地了解每个阶段发生的事情。
# Modify imputeTS::na_interpolate function
# (1) doesn't break on all NA vectors
# (2) won't impute leading and lagging NAs
na_interpolation2 <- function(x, option = "linear") {
library(TSimpute)
library(dplyr)
total_not_missing <- sum(!is.na(x))
# check there is sufficient data for na_interpolation
if(total_not_missing < 2) {x}
else
# replace takes an input vector, a T/F vector & replacement value
{replace(
# input vector is interpolated data
# this will impute leading/lagging NAs which we don't want
imputeTS::na_interpolation(x, option = option),
# create T/F vector for NAs,
is.na(na.approx(x, na.rm = FALSE)),
# replace TRUE with NA in input vector
NA)
}
}
# example data
data1 <- c(NA, NA, NA, NA, NA)
data2 <- c(NA, NA, 1, NA, 3, NA)
na_interpolation(data1)
# Error in na_interpolation(data1) : Input data needs at
# least 2 non-NA data point for applying na_interpolation
na_interpolation(data2)
# [1] 1 1 1 2 3 3
na_interpolation2(data1)
# [1] NA NA NA NA NA
na_interpolation2(data2)
# [1] NA NA 1 2 3 NA