在SAS中使用正则表达式挑选文本文件的特定文本

Question

我有以下数据（或类似数据）：

DATA test2;
INPUT STRING . ;
PUT STRING;
DATALINES;

James Bond is a spy
Hello World
123 Mill st P BOX 223
11 prospect ave p o box

P Box 225
Hello World
pobox 2212

P. O. box. 256
; 
run;

我只想读取以 "Hello World" 开头的行，直到下一个空白行，这样我的输出将是

Hello World
123 Mill st P BOX 223
11 prospect ave p o box

Hello World
pobox 2212

我的想法是对这两个（或通常更多）文本中的每一个进行一些操作，然后将它们附加在一起。但首先我只需要过滤掉我需要的文本。注意我原来的文本文件很大，不知道哪里有空格

我接下来的尝试是这样的：

data test3;
 set test2;
 if _n_=1 then do; 
 retain startline endline;
 startline = prxparse('/Hello World/');
 endline = prxparse('/^\s/');
 end;

 if (prxmatch(startline,STRING)=1 or prxmatch(endline,STRING)=1) ;
 run;

它给了我以下输出，但我还需要其余的...:[=14=]

编辑：我要强调的是，文本中可能到处都是空行，但我只想要之间 "Hello World"和下一个空行

Answer 1

我想我使用这段代码得到了想要的输出。

 data test3;
 set test2;
 retain outputflag;
 if find(upcase(string),'HELLO WORLD') then outputflag=1;
 if outputflag then output;
 if string='' then outputflag=0;
 run;

Answer 2

您必须分别检查开始和结束并保留标志。

编辑：这样只输出需要的数据线。串联必须在单独的步骤中完成。

data test3;
 set test2;

 if _n_=1 then do; 
 retain startline endline start ;
 startline = prxparse('/Hello World/');
 endline = prxparse('/^\s/');
 end;

 if prxmatch(endline,STRING)   then start = 0;
 else if prxmatch(startline,STRING) then start = 1;
 if start then output;

 run;

串联：

data test3;
 set test2;

 if _n_=1 then do; 
 retain startline endline start OUTPUT;
 length OUTPUT 00;
 startline = prxparse('/Hello World/');
 endline = prxparse('/^\s/');
 end;

 if prxmatch(endline,STRING) and OUTPUT ne "" then do; /* check for endline - output string as observation and reset  */
    output;
    start = 0;
    OUTPUT = "";
 end;

 if start then do;
    /* Add text manipulation here */
    OUTPUT = catx(" ",OUTPUT,STRING); /* concat string */
 end;

 if prxmatch(startline,STRING) then start = 1; /* check for startline */

 keep output;

 run;

在SAS中使用正则表达式挑选文本文件的特定文本

Pick out specific text of a text file using regular expressions in SAS

sas

regular-language