基于r中的多个条件进行插值

Interpolate based on multiple conditions in r

此处为初级用户。我有一个不同行业分类和不同次区域的年度就业人数数据集。对于某些观察结果,员工人数为空。我想通过线性插值(使用 na.approx 或其他方法)填充这些值。但是,我只想在同一行业分类和子区域内进行插值。

例如,我有这个:

subregion <- c("East Bay", "East Bay", "East Bay", "East Bay", "East Bay", "South Bay")
industry <-c("A","A","A","A","A","B" )
year <- c(2013, 2014, 2015, 2016, 2017, 2002)
emp <- c(50, NA, NA, 80,NA, 300)

data <- data.frame(cbind(subregion,industry,year, emp))

  subregion industry year  emp
1  East Bay        A 2013   50
2  East Bay        A 2014 <NA>
3  East Bay        A 2015 <NA>
4  East Bay        A 2016   80
5  East Bay        A 2017 <NA>
6 South Bay        B 2002  300

我需要生成这个 table,跳过第五个观察值的插值,因为次区域和行业与之前的观察值不匹配。

  subregion industry year  emp
1  East Bay        A 2013   50
2  East Bay        A 2014   60
3  East Bay        A 2015   70
4  East Bay        A 2016   80
5  East Bay        A 2017 <NA>
6 South Bay        B 2002  300

this 之类的文章很有帮助,但我无法弄清楚如何调整解决方案以满足两列相同以进行插值的要求,而不是一列。任何帮助将不胜感激。

我们可以通过 na.approx(来自 zoo

做一个小组
library(tidyverse)
data %>% 
     group_by(subregion, industry) %>%
     mutate(emp = zoo::na.approx(emp, na.rm = FALSE))
# A tibble: 6 x 4
# Groups:   subregion, industry [2]
#  subregion industry  year   emp
#  <fct>     <fct>    <dbl> <dbl>
#1 East Bay  A         2013    50
#2 East Bay  A         2014    60
#3 East Bay  A         2015    70
#4 East Bay  A         2016    80
#5 East Bay  A         2017    NA
#6 South Bay B         2002   300

数据

data <- data.frame(subregion,industry,year, emp)