删除SAS中的重复记录

Question

我有一个 table 将被加载到 oracle 数据库中。我需要在不更改数据顺序的情况下删除重复值。每组有 5 个可能的记录。 1.需要删除空行。 2.需要删除重复的名称，因此只出现不同的名称。 3.数据无法重新排序。

1   Commingled Data
2   Social Security
3   
4   
5   SSA  1996
1   Commingled Data
2   Social Security 
3   
4   
5   SSA 1997
1   Commingled Data
2   Social Security
3   
4   
5   SSA  -1998
1   Commingled Data
2   Statistical Administrative 
3   
4   
5   StARS 2000
1   Federal
2    Treasury
3   Internal 
4   1099
5   Master File - TY 1997 (1099/IRMF)
1   Federal 
2    Treasury
3   Internal 
4   1099
5   Master File - TY 1998 (1099/IRMF)
1   State
2    Wage
3   Indiana
4   
5    Indiana - 1990Q1-2005Q2
1   Federal 
2    Treasury
3   Internal 
4   1040
5    TY 2003 (1040/IMF) 1% File
1   Federal 
2    Treasury
3   Internal
4   1040
5   TY 2003 (1040/IMF) Cycles 1-39

Answer 1

这是 HASH 对象的一个很好的用例。如果您使用 multidata:'n' 和 ref 方法，它将检查记录是否已经在散列 table 中，如果没有，则添加它 - 但不添加重复项。

这里我添加 rownum 以便能够 return 到原始排序顺序，因为散列 table 是二叉树并且没有自然顺序，除非你强加它。

data have;
input @1 line .;
datalines;
1   Commingled Data
2   Social Security
3   
4   
5   SSA  1996
1   Commingled Data
2   Social Security 
3   
4   
5   SSA 1997
1   Commingled Data
2   Social Security
3   
4   
5   SSA  -1998
1   Commingled Data
2   Statistical Administrative 
3   
4   
5   StARS 2000
1   Federal
2    Treasury
3   Internal 
4   1099
5   Master File - TY 1997 (1099/IRMF)
1   Federal 
2    Treasury
3   Internal 
4   1099
5   Master File - TY 1998 (1099/IRMF)
1   State
2    Wage
3   Indiana
4   
5    Indiana - 1990Q1-2005Q2
1   Federal 
2    Treasury
3   Internal 
4   1040
5    TY 2003 (1040/IMF) 1% File
1   Federal 
2    Treasury
3   Internal
4   1040
5   TY 2003 (1040/IMF) Cycles 1-39
;;;;
run;

data _null_;
  set have end=eof;
  rownum = _n_;
  if _n_=1 then do;
    declare hash h(ordered:'n', multidata:'n');
    h.defineKey('line');
    h.defineData('line', 'rownum');
    h.defineDone();
  end;
  if not missing(substr(line,3)) then rc = h.ref();
  if eof then do;
    h.output(dataset:'want');
  end;
run;

proc sort data=want;
  by rownum;
run;

删除SAS中的重复记录

Deleting repeating records in SAS

sas

repeat