如何 select 精确匹配变量列表以附加数据集

How to select the exact matches for a list of variables to append datasets

我针对不同的波有不同的数据集。每个 wave 都有自己的数据集和变量名称前缀。我正在尝试使用我需要的变量子集导入和附加所有数据文件。因此,我目前正在做:

 var_list <- c("pidp", "jbsat", "jbhrs", "jbnssec8_dv", "panssec8_dv", "manssec8_dv", "paedqf", "maedqf", "qfhigh", "age_dv",
          "sex_dv", "psu", "strata", "employ", "jbhas", "jboff", "jbsem", "jbstat", "jbterm1", "jbterm2", "pjbptft", "fimnet_dv",
          "fimngrs_dv", "fimnlabnet_dv", "seearnnet_dv", "fimnmisc_dv", "fimnprben_dv", "fimninvent_dv", "fimnpen_dv", "fimnsben_dv", 
          "hhtype_dv", "livesp_dv", "nch14resp", "nmpsp_dv", "tenure_dv", "urban_dv", "jbsat", "health", "sf1", "scghqa",
          "scghqb", "scghqc", "scghqd", "scgqhe", "scgqhf", "scghqg", "scghqi", "scghqj", "scghqh", "scghql", "sclsat1", 
          "sclsat2", "sclsat3", "sclsat4", "indscus_lw", "indscub_xw")

然后导入第一个 wave 的数据,selecting 这些变量并删除 wave-prefix:

 longfile <- read_dta(file=paste0(dir, "ukhls_w1/a_indresp.dta")) %>% 
 select(matches(var_list)) %>% 
 rename_at(vars(starts_with("a_")), ~str_replace(.,"a_", "")) %>% #remove the wave prefix
 mutate(wave = 1) 

此时,我将简单地使用以下循环:

for (wn in 2:10) {
wl <- paste0(letters[wn],"_") 
wave_data <- read_dta(paste0(dir, "ukhls_w", wn, "/", wl, "indresp.dta")) %>% 
select(matches(var_list)) %>% 
rename_at(vars(starts_with(wl)), ~str_replace(.,wl, "")) %>% # remove prefix wave 
mutate(wave = wn)
longfile <- rbind(longfile, wave_data)
}   

但是,问题在于一些变量名称与后续波的文件中的多个列匹配。例如,在第二波中它存在一个名为“nxtjbhrs”的变量,因此当它匹配“jbhrs”时将被包括在内。这将在 rbind 中产生错误,因为列数会有所不同。

在这种情况下,我如何select 完全匹配?或者强制附加数据集?

感谢您的支持!

select(setdiff(names(.), var_list))