在 SPSS 或 Excel 中标准化异构年龄数据

Standardizing Heterogeneous Age Data in SPSS or Excel

我正在尝试使用 SPSS / SPSS Syntax / Excel 标准化一列年龄数据(即成岁/月)。我的直觉是使用一系列 DO IF 循环,即:

DO IF CHAR.INDEX(Age, "y")>1... for years
DO IF CHAR.INDEX(Age, "m")>1... for months
DO IF CHAR.INDEX(Age, "d")>1... for days

并让程序引用紧接在字符串之前的数字作为年/月/日的数量,并将其添加到新变量的总数中,该变量可能以天(最小单位)为单位后来被转换成年。

例如,对于单元格“3 年 5 个月”:将 3*365 + 5*30.5 = 1248 天添加到新变量(类似于 "DaysOld")。

单元格内容示例(没有任何字符串的数字假定为年份):

2    
5 months    
11 days    
1.7    
13 yr    
22 yrs    
13 months    
10 mo    
6/19/2016    
3y10m    
10m    
12y    
3.5 years    
3 years    
11 mos    
1 year 10 months    
1 year, two months    
20 Y    
13 y/o    
3 years in 2014

以下语法将解决很多情况,但绝对不是所有情况(例如“1.7”或“2014 年 3 年”)。你需要做更多的工作,但这应该能让你很好地开始......

首先,我重新创建您的样本数据以用于:

data list list/age (a30).
begin data
"2"
"5 months"
"11 days"
"1.7"
"13 yr"
"22 yrs"
"13 Months"
"10 mo"
"6/19/2016"
"3y10m"
"10m"
"12y"
"3.5 years"
"3 YEARS"
"11 mos"
"1 year 10 months"
"1 year, two months"
"20 Y"
"13 y/o"
"3 years in 2014"
end data.

开始工作:

* some necessary definitions.

string ageCleaned (a30) chr (a1) nm d m y (a5).
compute ageCleaned="".

* my first step is to create a "cleaned" age variable (it's possible to 
  manage without this variable but using this is better for debugging and
  improving the method).
* in the `ageCleaned` variable I only keep digits, periods (for decimal 
  point) and the characters "d", "m", "y".

do if CHAR.INDEX(lower(age),'ymd',1)>0.
loop #chrN=1 to char.length(age).
   compute chr=lower(char.substr(age,#chrN,1)).
   if CHAR.INDEX(chr,'0123456789ymd.',1)>0 ageCleaned=concat(rtrim(ageCleaned),chr).
end loop.
end if.

* the following line accounts for the word "days" which in the `ageCleaned` 
  variable has turned into the characters "dy".

compute ageCleaned=replace(ageCleaned,"dy","d").
exe.

* now I can work through the `ageCleaned` variable, accumulating digits 
  until I meet a character, then assigning the accumulated number to the
  right variable according to that character ("d", "m" or "y").

compute nm="".
loop #chrN=1 to char.length(ageCleaned).
   compute chr=char.substr(ageCleaned,#chrN,1).
   do if CHAR.INDEX(chr,'0123456789.',1)>0.
      compute nm=concat(rtrim(nm),chr).
   else.
      if chr="y" y=nm.
      if chr="m" m=nm.
      if chr="d" d=nm.
      compute nm="".
   end if.
end loop.
exe.

* we now have the numbers in string format, so after turning them into 
  numbers they are ready for use in calculations.

alter type d m y (f8.2).
compute DaysOld=sum(365*y, 30.5*m, d).