如何将一个变量拆分为多行
How to split one variable into multiple rows
**Application_id Reaon_code Value**
123 AB31AB45 £500
124 AB43RD49TY87 £640
125 RT87 £900
126 CD19RV29 £1000
我想要得到的是通过取这个子集来分离 reason_code
变量,每个原因只有 4 个字符并组合 2 个字母和 2 个数字,总是
我要获取的数据集如下:
Application_id Reason_code Value
123 AB31 £500
123 AB45 £500
124 AB43 £640
124 RD49 £640
124 TY87 £640
145 RT87 £900
希望这是有道理的。
第二个问题,我想创建一个标志显示:
Application_id Reason_code Value Waterfall_reason Unique_Reason
123 AB31 £500 1 (as it his AB31 first) 0 (as it hits both AB31 and AB45)
123 AB45 £500 0 (as it hits AB31 first) 0 (as it hits both AB31 and AB45)
124 AB43 £640 1 (as it hits AB43 first) 0 (as it hits both AB43,RD49 and TY87)
124 RD49 £640 0 0
124 TY87 £640 0 0
145 RT87 £900 1 (as it hits RT87 first) 1 (as it ONLY Hit RT87)
假设所有代码都是 4 个字符,那么一个简单的 DO 循环就可以完成这项工作。继续取前四个字符,直到字符串为空。如果您创建一个长度仅为 4 的变量并为其分配一个更长的字符串,则只适合前四个字符。然后,您可以使用 SUBSTR() 函数在下一次循环之前删除前四个字符。
data have ;
input ID Reason_Code :. Value ;
cards;
123 AB31AB45 500
124 AB43RD49TY87 640
125 RT87 900
126 CD19RV29 1000
;;;;
data want ;
set have (rename=(reason_code=reason_list));
length Reason_code Waterfall_reason 8 Unique_reason 8;
unique_reason = length(reason_list)<= 4;
waterfall_reason= 1;
do until (reason_list=' ');
reason_code = reason_list ;
output;
waterfall_reason=0;
reason_list = substr(reason_list,5);
end;
run;
Data have;
informat Application_id . Reaon_code 0. Value NLMNLGBP.;
input Application_id Reaon_code Value;
Format Value NLMNLGBP.;
cards;
123 AB31AB45 £500
124 AB43RD49TY87 £640
125 RT87 £900
126 CD19RV29 £1000
;
Data Want;
format Application_id . Reason_code . Value NLMNLGBP.;
set have;
OrigCode = Reaon_Code;
Keep Application_id Reason_code Value ;
Do Start = 1 to 25 by 4;* an arbitrary high number;* you could use a do while or a do until, also.;
Reason_code = Substr( Reaon_Code , start , 4 ) ;
if reason_code = '' then leave;
output;
end;
run;
这是使用正则表达式的另一种方法,基于不同的假设,即您的子字符串基于字母 + 数字,而不是固定的 4 字符设置。下面的代码将挑选符合字母 + 数字模式的字符串(在这种情况下将包括 2 个字母 + 2 个数字),一个接一个,直到输入字符串的整个长度都用完。 'waterfall_reason' 仅在第一个子字符串被选中后被标记,而 'unique_reason' 由 countw() 使用字母作为分隔符完成。
data have;
input ID Reason_Code :. Value;
cards;
123 ABcd31AB45 500
124 AB43RD49T87 640
125 RT87 900
126 C19RV29 1000
;;;;
data want;
set have;
_pat=prxparse('/[a-z]+[0-9]+/io');
_start=1;
_stop=length(reason_code);
unique_reason=ifn(countw(reason_code,,'a')=1,1,0);
do _n=1 by 1 until (_pos = 0);
call prxnext(_pat,_start,_stop,reason_code,_pos,_len);
new_code=substr(reason_code,_pos, _len);
waterfall_reason=ifn(_n=1,1,0);
if not missing (new_code) then
output;
end;
drop _:;
run;
**Application_id Reaon_code Value**
123 AB31AB45 £500
124 AB43RD49TY87 £640
125 RT87 £900
126 CD19RV29 £1000
我想要得到的是通过取这个子集来分离 reason_code
变量,每个原因只有 4 个字符并组合 2 个字母和 2 个数字,总是
我要获取的数据集如下:
Application_id Reason_code Value
123 AB31 £500
123 AB45 £500
124 AB43 £640
124 RD49 £640
124 TY87 £640
145 RT87 £900
希望这是有道理的。
第二个问题,我想创建一个标志显示:
Application_id Reason_code Value Waterfall_reason Unique_Reason
123 AB31 £500 1 (as it his AB31 first) 0 (as it hits both AB31 and AB45)
123 AB45 £500 0 (as it hits AB31 first) 0 (as it hits both AB31 and AB45)
124 AB43 £640 1 (as it hits AB43 first) 0 (as it hits both AB43,RD49 and TY87)
124 RD49 £640 0 0
124 TY87 £640 0 0
145 RT87 £900 1 (as it hits RT87 first) 1 (as it ONLY Hit RT87)
假设所有代码都是 4 个字符,那么一个简单的 DO 循环就可以完成这项工作。继续取前四个字符,直到字符串为空。如果您创建一个长度仅为 4 的变量并为其分配一个更长的字符串,则只适合前四个字符。然后,您可以使用 SUBSTR() 函数在下一次循环之前删除前四个字符。
data have ;
input ID Reason_Code :. Value ;
cards;
123 AB31AB45 500
124 AB43RD49TY87 640
125 RT87 900
126 CD19RV29 1000
;;;;
data want ;
set have (rename=(reason_code=reason_list));
length Reason_code Waterfall_reason 8 Unique_reason 8;
unique_reason = length(reason_list)<= 4;
waterfall_reason= 1;
do until (reason_list=' ');
reason_code = reason_list ;
output;
waterfall_reason=0;
reason_list = substr(reason_list,5);
end;
run;
Data have;
informat Application_id . Reaon_code 0. Value NLMNLGBP.;
input Application_id Reaon_code Value;
Format Value NLMNLGBP.;
cards;
123 AB31AB45 £500
124 AB43RD49TY87 £640
125 RT87 £900
126 CD19RV29 £1000
;
Data Want;
format Application_id . Reason_code . Value NLMNLGBP.;
set have;
OrigCode = Reaon_Code;
Keep Application_id Reason_code Value ;
Do Start = 1 to 25 by 4;* an arbitrary high number;* you could use a do while or a do until, also.;
Reason_code = Substr( Reaon_Code , start , 4 ) ;
if reason_code = '' then leave;
output;
end;
run;
这是使用正则表达式的另一种方法,基于不同的假设,即您的子字符串基于字母 + 数字,而不是固定的 4 字符设置。下面的代码将挑选符合字母 + 数字模式的字符串(在这种情况下将包括 2 个字母 + 2 个数字),一个接一个,直到输入字符串的整个长度都用完。 'waterfall_reason' 仅在第一个子字符串被选中后被标记,而 'unique_reason' 由 countw() 使用字母作为分隔符完成。
data have;
input ID Reason_Code :. Value;
cards;
123 ABcd31AB45 500
124 AB43RD49T87 640
125 RT87 900
126 C19RV29 1000
;;;;
data want;
set have;
_pat=prxparse('/[a-z]+[0-9]+/io');
_start=1;
_stop=length(reason_code);
unique_reason=ifn(countw(reason_code,,'a')=1,1,0);
do _n=1 by 1 until (_pos = 0);
call prxnext(_pat,_start,_stop,reason_code,_pos,_len);
new_code=substr(reason_code,_pos, _len);
waterfall_reason=ifn(_n=1,1,0);
if not missing (new_code) then
output;
end;
drop _:;
run;