从 SAS 中的文本字符串中提取 6 位和 8 位数字

Question

很久了reader，第一次发贴

我一直在寻找答案，但在我的技能范围内没有任何东西可以转化为解决方案。如果有任何帮助，我将不胜感激！

我正在尝试从 SAS 中的文本数据集中提取数字，所以在 ProcSQL 或 DATAstep 中。

我想 return 自由文本字段中的数字组。

该字段包含：

一个 8 位数字
以上 AND 一个 6 位数字，有时会通过各种标点符号将其分成 2 组
都没有

-在文本中的任意位置，任意长度的文本，无论两侧是否有文本。例如：

REC    NOTES

001    Collateral 83948572 (code 56/56-55) open June 2013

002    Scoobydoo 12.12.12 88888888

003    54545454 over three years

我想提取到输出中：

8-digit no. if present     | 6-digit no. if present

83948572                   | 565655
88888888                   | 121212
54545454                   |

谁能建议我看的方向？

Answer 1

使用 SUBSTRING、STUFF 和 PATINDEX 函数。

SELECT REC, 
substring(STUFF(NOTES, PATINDEX('%[^0-9]%', NOTES), 1, '') , patindex('[0-9][0-9][0-9][0-9][0-9][0-9]', STUFF(NOTES, PATINDEX('%[^0-9]%', NOTES), 1, '') ), 6)AS "6digit",
substring(STUFF(NOTES, PATINDEX('%[^0-9]%', NOTES), 1, '') , patindex('%[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]%', STUFF(NOTES, PATINDEX('%[^0-9]%', NOTES), 1, '') ), 8) AS "8digit"
FROM yourtable

Answer 2

试试这个：

data have;
input REC $    NOTES .;
temp=prxchange('s/[a-z]+//i',-1,notes);
do i=1 to countw(temp);
   num=compress(scan(temp,i,' '),,'kd');
   if length(num)=8 then num8=num;
   else if length(num)=6 then num6=num;
end;
drop notes num i temp;
cards;
001    Collateral 83948572 (code 56/56-55) open June 2013
002    Scoobydoo 12.12.12 88888888
003    54545454 over three years
;
proc print ;
run;

从 SAS 中的文本字符串中提取 6 位和 8 位数字

Extract 6- and 8-digit numbers from text string in SAS

sql

sas

proc-sql