使用滞后函数按行检索数据

Retrive data by row with lag function

早上好。 我有这个数据集:

Appendix | Change_Serial_Number| Status     | Duration | Mileage  | Service
20101234        0                   .            60       120000       Z
20101234        1                 Proposed       48       110000       Z
20101234        2                 Activated      24        90000       Z
20101234        3                 Proposed       60       120000       Z
20101234        4                 Proposed       50       160000       B
20101234        5                 Activated      36       110000       B

每一行都是一个变体,可以激活或仅建议第一行的状态为空白或之前激活的变体。 我需要这个 table:

Appendix | Change_Serial_Number| Status     | Duration | Mileage  | Service |Duration_Prev| Mileage_prev |
20101234        0                   .            60       120000       Z        .
20101234        1                 Proposed       48       110000       Z        60              120000
20101234        2                 Activated      24        90000       Z        60              120000
20101234        3                 Proposed       60       120000       Z        24              90000
20101234        4                 Proposed       50       160000       B        24              90000
20101234        5                 Activated      36       110000       B        24              90000

我需要将每个变体的持续时间、里程和服务与之前激活的或者只有在没有激活变体的情况下与初始条件进行比较。

我尝试使用滞后函数来检索前一行的数据,但我需要检索 3 个字段的数据,并且只能从最后激活的变体中检索数据,或者如果没有,则从初始条件中检索数据。

我使用了这个代码:

proc sort data=db_rdg;
       by Appendix Change_Serial_Number descending Change_Serial_Number;
  run;

  data db_rdg2;
       set db_rdg;
  by Appendix;
  Duration_prev=lag(Duration);
  if first. Appendix then Durata_prev =.;
  run;

使用此代码,我只能从前一行(而不是从先前激活的行或第一个条件)检索数据,并且只能检索持续时间变量(不能同时检索持续时间、里程和服务).

希望我说得足够清楚:)

感谢您的帮助!

不是使用 LAG 从前一行中检索 duration,而是希望将激活状态跟踪变量(持续时间、里程和序列)存储在保留和更新的变量中 after 显式输出。

在这两个示例代码中,我加入了跟踪序列,因为您可能想知道先前激活后的更改数量。

data have; input
Appendix   Change_Serial_Number   Status $     Duration  Mileage   Service $;
datalines;
20101234        0                   .            60       120000       Z
20101234        1                 Proposed       48       110000       Z
20101234        2                 Activated      24        90000       Z
20101234        3                 Proposed       60       120000       Z
20101234        4                 Proposed       50       160000       B
20101234        5                 Activated      36       110000       B
run;

* NOTE: _APA suffix means @ prior activate;

* version 1;
* implicit loop with by group processing means ;
* explicit first. test needed in order to reset the apa tracking variables;

data want;
  set have;
  by appendix;

  if first.appendix then do;
     length csn_apa dur_apa mil_apa 8;
     call missing(csn_apa, dur_apa, mil_apa);    
  end;

  output;

  if status in (' ' 'Activate') then do;
    csn_apa = change_serial_number;
    dur_apa = duration;
    mil_apa = mileage;
  end;

  retain csn_apa dur_apa mil_apa;
run;

* version 2;
* DOW version;
* explicit loop over group means first. handling not explicitly needed;
* implicit loop performs tracking variable resets;
* retain not needed because output and tracking variables modified;
* within current iteration of implicit loop;

data want2;
  do until (last.appendix);
    set have;
    by appendix;

    output;

    if status in (' ' 'Activate') then do;
      csn_apa = change_serial_number;
      dur_apa = duration;
      mil_apa = mileage;
    end;
  end;
run;

lag() 函数只对处理之前特定数量的观察值非常有用。在这种情况下,您不知道要使用的值是来自先前的观察值还是来自之前的五六个观察值,因此您应该 RETAIN 而不是使用 lag() 附加变量并在适当的时候更新它们的值:

data db_rdg2;
  retain duration_prev .;
  set db_rdg;
  by Appendix;
  if first.Appendix or status = 'Activated' then duration_prev = duration;
run;

RETAIN 语句允许 duration_prev 在从输入中读取每个新观察值时保留其值,而不是重置为缺失值。

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000214163.htm