基于r中的多个条件进行插值
Interpolate based on multiple conditions in r
此处为初级用户。我有一个不同行业分类和不同次区域的年度就业人数数据集。对于某些观察结果,员工人数为空。我想通过线性插值(使用 na.approx 或其他方法)填充这些值。但是,我只想在同一行业分类和子区域内进行插值。
例如,我有这个:
subregion <- c("East Bay", "East Bay", "East Bay", "East Bay", "East Bay", "South Bay")
industry <-c("A","A","A","A","A","B" )
year <- c(2013, 2014, 2015, 2016, 2017, 2002)
emp <- c(50, NA, NA, 80,NA, 300)
data <- data.frame(cbind(subregion,industry,year, emp))
subregion industry year emp
1 East Bay A 2013 50
2 East Bay A 2014 <NA>
3 East Bay A 2015 <NA>
4 East Bay A 2016 80
5 East Bay A 2017 <NA>
6 South Bay B 2002 300
我需要生成这个 table,跳过第五个观察值的插值,因为次区域和行业与之前的观察值不匹配。
subregion industry year emp
1 East Bay A 2013 50
2 East Bay A 2014 60
3 East Bay A 2015 70
4 East Bay A 2016 80
5 East Bay A 2017 <NA>
6 South Bay B 2002 300
this 之类的文章很有帮助,但我无法弄清楚如何调整解决方案以满足两列相同以进行插值的要求,而不是一列。任何帮助将不胜感激。
我们可以通过 na.approx
(来自 zoo
)
做一个小组
library(tidyverse)
data %>%
group_by(subregion, industry) %>%
mutate(emp = zoo::na.approx(emp, na.rm = FALSE))
# A tibble: 6 x 4
# Groups: subregion, industry [2]
# subregion industry year emp
# <fct> <fct> <dbl> <dbl>
#1 East Bay A 2013 50
#2 East Bay A 2014 60
#3 East Bay A 2015 70
#4 East Bay A 2016 80
#5 East Bay A 2017 NA
#6 South Bay B 2002 300
数据
data <- data.frame(subregion,industry,year, emp)
此处为初级用户。我有一个不同行业分类和不同次区域的年度就业人数数据集。对于某些观察结果,员工人数为空。我想通过线性插值(使用 na.approx 或其他方法)填充这些值。但是,我只想在同一行业分类和子区域内进行插值。
例如,我有这个:
subregion <- c("East Bay", "East Bay", "East Bay", "East Bay", "East Bay", "South Bay")
industry <-c("A","A","A","A","A","B" )
year <- c(2013, 2014, 2015, 2016, 2017, 2002)
emp <- c(50, NA, NA, 80,NA, 300)
data <- data.frame(cbind(subregion,industry,year, emp))
subregion industry year emp
1 East Bay A 2013 50
2 East Bay A 2014 <NA>
3 East Bay A 2015 <NA>
4 East Bay A 2016 80
5 East Bay A 2017 <NA>
6 South Bay B 2002 300
我需要生成这个 table,跳过第五个观察值的插值,因为次区域和行业与之前的观察值不匹配。
subregion industry year emp
1 East Bay A 2013 50
2 East Bay A 2014 60
3 East Bay A 2015 70
4 East Bay A 2016 80
5 East Bay A 2017 <NA>
6 South Bay B 2002 300
this 之类的文章很有帮助,但我无法弄清楚如何调整解决方案以满足两列相同以进行插值的要求,而不是一列。任何帮助将不胜感激。
我们可以通过 na.approx
(来自 zoo
)
library(tidyverse)
data %>%
group_by(subregion, industry) %>%
mutate(emp = zoo::na.approx(emp, na.rm = FALSE))
# A tibble: 6 x 4
# Groups: subregion, industry [2]
# subregion industry year emp
# <fct> <fct> <dbl> <dbl>
#1 East Bay A 2013 50
#2 East Bay A 2014 60
#3 East Bay A 2015 70
#4 East Bay A 2016 80
#5 East Bay A 2017 NA
#6 South Bay B 2002 300
数据
data <- data.frame(subregion,industry,year, emp)