SAS 数据更新/操作

SAS data update / manipulation

我对 SAS 编程还比较陌生,但在过去的几个月里我一直在学习基础知识,它满足了我的需要。但是我目前遇到了麻烦,需要一些帮助。我正在尝试更新数据库并创建两个有助于跟踪更新的新变量。所以我用下表简化了我的问题:

来源Table

ID      Record_ID   Correction_ID
0001    A001    
0002    A002    
0003    A003        A001
0004    A004        A002
0005    A005    
0006    A006        A004

目标Table

ID          Record_ID   Correction_ID   Original_Record     Count
0001        A001                            A001                0
0002        A002                            A002                0
0003        A003            A001            A001                1
0004        A004            A002            A002                1
0005        A005                            A005                0
0006        A006            A004            A002                2

Correction_ID表示当前正在尝试更正/修改的记录。

Count表示原始记录上的更新次数。

谢谢。

编辑

Proc SQL 我试过但没有用的代码:

ID          Record_ID   Correction_ID   Original_Record     Count
Table 1
0001        A001                            A001            0
0002        A002                            A002            0
0005        A005                            A005            0

Table 2
0003        A003            A001        
0004        A004            A002        
0006        A006            A004        

SELECT  ID,
        Record_ID, *how to include ID from both table? Or don’t even separate? 
        Correction_ID, *same as above
        CASE
            WHEN Correction_ID is null THEN One.Original_Record
                ELSE (SELECT Original_Record FROM One WHERE Two.Correction_ID=One.Record_ID)
        END as Original_Record,
        CASE
            WHEN Count is not null THEN One.Count
                ELSE (SELECT Count FROM One WHERE Two.Correction_ID=One.Record_ID) + 1
        END as Count;
        FROM Table 1 AS One, Table 2 AS Two;

以下代码似乎适用于您的数据。它利用 Hash 对象,其中 'Original_Record' 被保留并且 'count' 被添加。一些元素现在可能是多余的(可能不需要'_start')。

data have;
    infile cards truncover;
    input (ID      Record_ID   Correction_ID) (:.);
    cards;
0001    A001    
0002    A002    
0003    A003        A001
0004    A004        A002
0005    A005    
0006    A006        A004
0007    A007        
0008    A008        A006
0009    A009        A003
;;;;
run;

data want;
    if _n_=1 then
        do;
            declare hash h();
            h.definekey('_end');
            h.definedata('_end', '_start', '_origin', 'count');
            h.definedone();
            length _end _start _origin $ 8;
            /*call missing (of _:, count);*/
        end;

    set have;

    if missing (correction_id) then
        do;
            original_record=record_id;
            count=0;
        end;
    else
        do;
            rc=h.find(key:correction_id);

            if rc ne 0 then
                    do;
/*if there is no match, this would be the first time of modifying, '_origin' is set to the value of correction_id, count is set to 1*/
                    _origin=correction_id;
                    count=1;
                end;
            else
                do;

/*if there is a match, then '_origin stays the same, so no  
operation is needed, but count increased by 1*/
                    count=count+1;
                end;

            _end=record_id;
            _start=correction_id;
            Original_Record=_origin;
            rc=h.replace();
        end;

    drop rc _:;
run;

如果您拥有 SAS/OR 许可证,则此代码的作用大致相同,但更简单,因为 PROC OPTMODEL 数组是散列。它确实将所有数据加载到 RAM 中,因此简单的代价是内存消耗。

我会重用海阔的数据集:

 data have;
    infile cards truncover;
    input (ID      Record_ID   Correction_ID) (:.);
    cards;
    0001    A001    
    0002    A002    
    0003    A003        A001
    0004    A004        A002
    0005    A005    
    0006    A006        A004
    0007    A007        
    0008    A008        A006
    0009    A009        A003
;

我不认为我们真的需要 ID,所以我忽略了它以使代码更加说明。它不在内部使用,但如果需要,您可以将其添加到 read datacreate data 语句中。

proc optmodel;
    set<str,str> RECORDS;
    set ALL = setof{<i,j> in RECORDS} i;
    str parent  {ALL diff {<i,('')> in RECORDS}};
    str original{i in ALL} init i;
    num count   {     ALL} init 0;

    read data have into RECORDS=[Record_Id Correction_ID];
    for {<ri,rj> in RECORDS: rj ~= ''} do;
        parent  [ri] = rj;
        count   [ri] = count   [parent[ri]] + 1;
        original[ri] = original[parent[ri]];
    end;
    create data want from [Record_ID Correction_ID]=RECORDS 
        Original_Record = original[Record_ID] Count = count[Record_ID];
quit;