SAS 数据更新/操作
SAS data update / manipulation
我对 SAS 编程还比较陌生,但在过去的几个月里我一直在学习基础知识,它满足了我的需要。但是我目前遇到了麻烦,需要一些帮助。我正在尝试更新数据库并创建两个有助于跟踪更新的新变量。所以我用下表简化了我的问题:
来源Table
ID Record_ID Correction_ID
0001 A001
0002 A002
0003 A003 A001
0004 A004 A002
0005 A005
0006 A006 A004
目标Table
ID Record_ID Correction_ID Original_Record Count
0001 A001 A001 0
0002 A002 A002 0
0003 A003 A001 A001 1
0004 A004 A002 A002 1
0005 A005 A005 0
0006 A006 A004 A002 2
Correction_ID表示当前正在尝试更正/修改的记录。
Count表示原始记录上的更新次数。
谢谢。
编辑
Proc SQL 我试过但没有用的代码:
ID Record_ID Correction_ID Original_Record Count
Table 1
0001 A001 A001 0
0002 A002 A002 0
0005 A005 A005 0
Table 2
0003 A003 A001
0004 A004 A002
0006 A006 A004
SELECT ID,
Record_ID, *how to include ID from both table? Or don’t even separate?
Correction_ID, *same as above
CASE
WHEN Correction_ID is null THEN One.Original_Record
ELSE (SELECT Original_Record FROM One WHERE Two.Correction_ID=One.Record_ID)
END as Original_Record,
CASE
WHEN Count is not null THEN One.Count
ELSE (SELECT Count FROM One WHERE Two.Correction_ID=One.Record_ID) + 1
END as Count;
FROM Table 1 AS One, Table 2 AS Two;
以下代码似乎适用于您的数据。它利用 Hash 对象,其中 'Original_Record' 被保留并且 'count' 被添加。一些元素现在可能是多余的(可能不需要'_start')。
data have;
infile cards truncover;
input (ID Record_ID Correction_ID) (:.);
cards;
0001 A001
0002 A002
0003 A003 A001
0004 A004 A002
0005 A005
0006 A006 A004
0007 A007
0008 A008 A006
0009 A009 A003
;;;;
run;
data want;
if _n_=1 then
do;
declare hash h();
h.definekey('_end');
h.definedata('_end', '_start', '_origin', 'count');
h.definedone();
length _end _start _origin $ 8;
/*call missing (of _:, count);*/
end;
set have;
if missing (correction_id) then
do;
original_record=record_id;
count=0;
end;
else
do;
rc=h.find(key:correction_id);
if rc ne 0 then
do;
/*if there is no match, this would be the first time of modifying, '_origin' is set to the value of correction_id, count is set to 1*/
_origin=correction_id;
count=1;
end;
else
do;
/*if there is a match, then '_origin stays the same, so no
operation is needed, but count increased by 1*/
count=count+1;
end;
_end=record_id;
_start=correction_id;
Original_Record=_origin;
rc=h.replace();
end;
drop rc _:;
run;
如果您拥有 SAS/OR 许可证,则此代码的作用大致相同,但更简单,因为 PROC OPTMODEL 数组是散列。它确实将所有数据加载到 RAM 中,因此简单的代价是内存消耗。
我会重用海阔的数据集:
data have;
infile cards truncover;
input (ID Record_ID Correction_ID) (:.);
cards;
0001 A001
0002 A002
0003 A003 A001
0004 A004 A002
0005 A005
0006 A006 A004
0007 A007
0008 A008 A006
0009 A009 A003
;
我不认为我们真的需要 ID
,所以我忽略了它以使代码更加说明。它不在内部使用,但如果需要,您可以将其添加到 read data
和 create data
语句中。
proc optmodel;
set<str,str> RECORDS;
set ALL = setof{<i,j> in RECORDS} i;
str parent {ALL diff {<i,('')> in RECORDS}};
str original{i in ALL} init i;
num count { ALL} init 0;
read data have into RECORDS=[Record_Id Correction_ID];
for {<ri,rj> in RECORDS: rj ~= ''} do;
parent [ri] = rj;
count [ri] = count [parent[ri]] + 1;
original[ri] = original[parent[ri]];
end;
create data want from [Record_ID Correction_ID]=RECORDS
Original_Record = original[Record_ID] Count = count[Record_ID];
quit;
我对 SAS 编程还比较陌生,但在过去的几个月里我一直在学习基础知识,它满足了我的需要。但是我目前遇到了麻烦,需要一些帮助。我正在尝试更新数据库并创建两个有助于跟踪更新的新变量。所以我用下表简化了我的问题:
来源Table
ID Record_ID Correction_ID
0001 A001
0002 A002
0003 A003 A001
0004 A004 A002
0005 A005
0006 A006 A004
目标Table
ID Record_ID Correction_ID Original_Record Count
0001 A001 A001 0
0002 A002 A002 0
0003 A003 A001 A001 1
0004 A004 A002 A002 1
0005 A005 A005 0
0006 A006 A004 A002 2
Correction_ID表示当前正在尝试更正/修改的记录。
Count表示原始记录上的更新次数。
谢谢。
编辑
Proc SQL 我试过但没有用的代码:
ID Record_ID Correction_ID Original_Record Count
Table 1
0001 A001 A001 0
0002 A002 A002 0
0005 A005 A005 0
Table 2
0003 A003 A001
0004 A004 A002
0006 A006 A004
SELECT ID,
Record_ID, *how to include ID from both table? Or don’t even separate?
Correction_ID, *same as above
CASE
WHEN Correction_ID is null THEN One.Original_Record
ELSE (SELECT Original_Record FROM One WHERE Two.Correction_ID=One.Record_ID)
END as Original_Record,
CASE
WHEN Count is not null THEN One.Count
ELSE (SELECT Count FROM One WHERE Two.Correction_ID=One.Record_ID) + 1
END as Count;
FROM Table 1 AS One, Table 2 AS Two;
以下代码似乎适用于您的数据。它利用 Hash 对象,其中 'Original_Record' 被保留并且 'count' 被添加。一些元素现在可能是多余的(可能不需要'_start')。
data have;
infile cards truncover;
input (ID Record_ID Correction_ID) (:.);
cards;
0001 A001
0002 A002
0003 A003 A001
0004 A004 A002
0005 A005
0006 A006 A004
0007 A007
0008 A008 A006
0009 A009 A003
;;;;
run;
data want;
if _n_=1 then
do;
declare hash h();
h.definekey('_end');
h.definedata('_end', '_start', '_origin', 'count');
h.definedone();
length _end _start _origin $ 8;
/*call missing (of _:, count);*/
end;
set have;
if missing (correction_id) then
do;
original_record=record_id;
count=0;
end;
else
do;
rc=h.find(key:correction_id);
if rc ne 0 then
do;
/*if there is no match, this would be the first time of modifying, '_origin' is set to the value of correction_id, count is set to 1*/
_origin=correction_id;
count=1;
end;
else
do;
/*if there is a match, then '_origin stays the same, so no
operation is needed, but count increased by 1*/
count=count+1;
end;
_end=record_id;
_start=correction_id;
Original_Record=_origin;
rc=h.replace();
end;
drop rc _:;
run;
如果您拥有 SAS/OR 许可证,则此代码的作用大致相同,但更简单,因为 PROC OPTMODEL 数组是散列。它确实将所有数据加载到 RAM 中,因此简单的代价是内存消耗。
我会重用海阔的数据集:
data have;
infile cards truncover;
input (ID Record_ID Correction_ID) (:.);
cards;
0001 A001
0002 A002
0003 A003 A001
0004 A004 A002
0005 A005
0006 A006 A004
0007 A007
0008 A008 A006
0009 A009 A003
;
我不认为我们真的需要 ID
,所以我忽略了它以使代码更加说明。它不在内部使用,但如果需要,您可以将其添加到 read data
和 create data
语句中。
proc optmodel;
set<str,str> RECORDS;
set ALL = setof{<i,j> in RECORDS} i;
str parent {ALL diff {<i,('')> in RECORDS}};
str original{i in ALL} init i;
num count { ALL} init 0;
read data have into RECORDS=[Record_Id Correction_ID];
for {<ri,rj> in RECORDS: rj ~= ''} do;
parent [ri] = rj;
count [ri] = count [parent[ri]] + 1;
original[ri] = original[parent[ri]];
end;
create data want from [Record_ID Correction_ID]=RECORDS
Original_Record = original[Record_ID] Count = count[Record_ID];
quit;