如何将一个变量拆分为多行

How to split one variable into multiple rows

**Application_id                Reaon_code            Value** 
123                              AB31AB45                £500
124                              AB43RD49TY87            £640
125                              RT87                    £900
126                              CD19RV29               £1000

我想要得到的是通过取这个子集来分离 reason_code 变量,每个原因只有 4 个字符并组合 2 个字母和 2 个数字,总是

我要获取的数据集如下:

Application_id             Reason_code                       Value 
123                             AB31                          £500 
123                             AB45                          £500
124                             AB43                          £640
124                             RD49                          £640
124                             TY87                          £640
145                             RT87                          £900

希望这是有道理的。

第二个问题,我想创建一个标志显示:

Application_id             Reason_code               Value           Waterfall_reason                                                           Unique_Reason
123                             AB31                          £500                       1 (as it his AB31 first)                              0 (as it hits both AB31 and AB45)
123                             AB45                          £500                       0 (as it hits AB31 first)                             0 (as it hits both AB31 and AB45)
124                             AB43                         £640                        1 (as it hits AB43 first)                             0 (as it hits both AB43,RD49 and TY87)
124                             RD49                         £640                        0                                                            0
124                            TY87                           £640                        0                                                            0
145                            RT87                          £900                        1 (as it hits RT87 first)                              1 (as it ONLY Hit RT87) 

假设所有代码都是 4 个字符,那么一个简单的 DO 循环就可以完成这项工作。继续取前四个字符,直到字符串为空。如果您创建一个长度仅为 4 的变量并为其分配一个更长的字符串,则只适合前四个字符。然后,您可以使用 SUBSTR() 函数在下一次循环之前删除前四个字符。

data have ;
  input ID Reason_Code :. Value ;
cards;
123 AB31AB45 500
124 AB43RD49TY87 640
125 RT87 900
126 CD19RV29 1000
;;;;
data want ;
  set have (rename=(reason_code=reason_list));
  length Reason_code  Waterfall_reason 8 Unique_reason 8;
  unique_reason = length(reason_list)<= 4;
  waterfall_reason= 1;
  do until (reason_list=' ');
    reason_code = reason_list ;
    output;
    waterfall_reason=0;
    reason_list = substr(reason_list,5);
  end;
run;
 Data have;
 informat Application_id   .              Reaon_code 0.           Value NLMNLGBP.;
 input Application_id                Reaon_code            Value;
 Format Value NLMNLGBP.;
cards;

123                              AB31AB45                £500
124                              AB43RD49TY87            £640
125                              RT87                    £900
126                              CD19RV29               £1000
 ; 
Data Want;
 format Application_id   .              Reason_code .           Value     NLMNLGBP.;
 set have;
 OrigCode = Reaon_Code;
 Keep Application_id   Reason_code   Value ;
 Do Start = 1 to 25 by 4;* an arbitrary high number;* you could use a do while or a do until, also.;
     Reason_code = Substr( Reaon_Code , start ,   4 ) ;
     if reason_code = '' then leave;
      output;
  end;

 run;

这是使用正则表达式的另一种方法,基于不同的假设,即您的子字符串基于字母 + 数字,而不是固定的 4 字符设置。下面的代码将挑选符合字母 + 数字模式的字符串(在这种情况下将包括 2 个字母 + 2 个数字),一个接一个,直到输入字符串的整个长度都用完。 'waterfall_reason' 仅在第一个子字符串被选中后被标记,而 'unique_reason' 由 countw() 使用字母作为分隔符完成。

data have;
    input ID Reason_Code :. Value;
    cards;
123 ABcd31AB45 500
124 AB43RD49T87 640
125 RT87 900
126 C19RV29 1000
;;;;

data want;
    set have;
    _pat=prxparse('/[a-z]+[0-9]+/io');
    _start=1;
    _stop=length(reason_code);
    unique_reason=ifn(countw(reason_code,,'a')=1,1,0);

    do _n=1 by 1 until (_pos = 0);
        call prxnext(_pat,_start,_stop,reason_code,_pos,_len);
        new_code=substr(reason_code,_pos, _len);
        waterfall_reason=ifn(_n=1,1,0);

        if not missing (new_code) then
            output;
    end;

    drop _:;
run;