SAS proc sql 取消已经合并的观察结果

SAS proc sql disqualify observations that have already been merged

我正在使用 5 个标准集合并两个数据集,条件是如果在标准集编号 n 下创建了匹配项,那么这些观察结果将无法在任何其他条件下进行匹配标准集 x > n。例如,如果在数据集 1:observation 10 和 dataset2:observation15 之间设置的第一个条件下合并成功,则这两个观察结果不符合任何后续条件(第二、第三、第四)的合并条件, 第五)。

到目前为止,我的方法是向合并创建的 table 添加一个标志变量,然后将 table 合并回两个父数据集,然后对于下一个条件集,我要求标志变量丢失。但是,我有很大的新数据集,并且此方法因 "out of resources" 错误而失败。这很简单,但很抱歉。提前感谢阅读。

前两个条件的当前代码示例:

* Initialize parent datasets
data work.parentdata1;
set lib.parentdata1;
run;

data work.parentdata2;
set lib.parentdata2;
run;

***************;
*Criteria set 1;
***************;
proc sql;
create table match_1 as
select *
from parentdata1 o, parentdata2 t
    where o.variable_A = t.variable_a
    and o.variable_B= t.variable_b
;
quit;

* Results dataset (to be used for later analysis);
data work.match_1;
    set match_1;
    match_quality = 1;
run;

* Dataset for merge with parent dataset 1;
data work.mergematched_1;
    set match_1;
    match_dummy = 1;
run;

* sort matched table by parent dataset 1 id to prepare for parent merge;
proc sort data = work.mergematched_1;
    by id1;
run;

* merge matched observations back to parent dataset 1 to disqualify from      future criteria sets;
data work.parentdata1_a;
    merge work.parentdata1 work.mergematched_1;
    by id1;
run;

*sort matched table by parent dataset 2 id to prepare for parent merge;
proc sort data = work.mergematched_1;
    by id2;
run;

*merge matched observations back to parent dataset 2 to disqualify from   future criteria sets;
data work.parentdata2_a;
    merge work.parentdata2 work.mergematched_2;
    by id2;
run;
***************;
*Criteria set 2;
***************;
proc sql;
create table match_2 as
select *
from parentdata1 o, parentdata2 t
where o.match_dummy = . and t.match_dummy = .
and o.variable_X = t.variable_x
and o.variable_Y= t.variable_y
;
quit;

* Results dataset (to be used for later analysis);
data work.match_2;
set match_2;
match_quality = 2;
run;

* Dataset for merge with parent dataset 1a;
data work.mergematched_2;
set match_2;
match_dummy = 1;
run;

* sort matched table by parent dataset 1a id to prepare for parent merge;
proc sort data = work.mergematched_2;
by id1;
run;

* merge matched observations back to parent dataset 1a to disqualify from      future criteria sets;
data work.parentdata1_b;
merge work.parentdata1_a work.mergematched_2;
by id1;
run;

*sort matched table by parent dataset 2a id to prepare for parent merge;
proc sort data = work.mergematched_2;
by id2;
run;

*merge matched observations back to parent dataset 2a to disqualify from future criteria sets;
data work.parentdata2_b;
merge work.parentdata2_a work.mergematched_2;
by id2;
run;

由于您在合并中仅使用 3 或 4 个值,请尝试删除所有其他变量并仅保留用于合并的变量,然后在最后合并回所有其他结果。

我也不知道您是否需要 match_1 和 merged_match1,因为除了添加另一个变量外,它似乎没有更改文件。如果您 运行 超出 space,请尽可能避免创建临时数据集。

我认为您也可以像下面这样将几个步骤合并为一个,这是条件 1 下的前四个 procs/data 步骤。

proc sql;
create table match_1 as
select *, 1 as match_quality, 1 as match_dummy
from parentdata1 o
inner join parentdata2 t
on o.variable_A = t.variable_a
and o.variable_B= t.variable_b
order by id1
;
quit;