SAS 导入 JSON
SAS importing JSON
我需要将一些 JSON 转换为机器友好的格式(例如 CSV、Excel、Stata、SAS),并且我正在使用 SAS,因为我的文件很大。
观察示例:
{"business_id": "vcNAWiLM4dR7D2nwwJ7nCA", "full_address": "4840 E Indian School Rd\nSte 101\nPhoenix, AZ 85018", "hours": {"Tuesday": {"close": "17:00", "open": "08:00"}, "Friday": {"close": "17:00", "open": "08:00"}, "Monday": {"close": "17:00", "open": "08:00"}, "Wednesday": {"close": "17:00", "open": "08:00"}, "Thursday": {"close": "17:00", "open": "08:00"}}, "open": true, "categories": ["Doctors", "Health & Medical"], "city": "Phoenix", "review_count": 9, "name": "Eric Goldberg, MD", "neighborhoods": [], "longitude": -111.98375799999999, "state": "AZ", "stars": 3.5, "latitude": 33.499313000000001, "attributes": {"By Appointment Only": true}, "type": "business"}
我一直在使用 http://support.sas.com/resources/papers/proceedings13/296-2013.pdf 推荐的方法。
问题是并非所有观察结果都具有相同的条目。例如,某些观测值可能会丢失 "full_address".
因此我的示例代码现在是
filename data '(filename)';
data datatest; * defines dataset;
infile data lrecl = 32000 truncover scanover;
input
@'"business_id": "' business_id 5.
;
business_id = substr(business_id,1,index(business_id,'",')-1);
IF INDEX(_INFILE_,'"full_address":') > 0
THEN DO;
input @'"full_address": "' full_address 5.;
full_address = substr(full_address,1,index(full_address,'",')-1);
END;
run;
proc print data = work.datatest;
run;
问题是代码似乎跳过了所有其他观察。我怎样才能防止这种情况发生?
您的问题是您的初始输入超过了 full_address(因为它占用了 255 个字符)。您可以像这样解决这个问题:
input
@'"business_id": "' business_id 5. +(-254) @
;
基本上将指针重置回字段的开头并允许您查找下一部分。
您也可以换一种方式阅读本文;如果你有 SAS 9.3(我相信),PROC GROOVY
可以用来以更简单的方式读入 JSON 文件。有关详细信息,请参阅我对 this question 的回答。
我需要将一些 JSON 转换为机器友好的格式(例如 CSV、Excel、Stata、SAS),并且我正在使用 SAS,因为我的文件很大。
观察示例:
{"business_id": "vcNAWiLM4dR7D2nwwJ7nCA", "full_address": "4840 E Indian School Rd\nSte 101\nPhoenix, AZ 85018", "hours": {"Tuesday": {"close": "17:00", "open": "08:00"}, "Friday": {"close": "17:00", "open": "08:00"}, "Monday": {"close": "17:00", "open": "08:00"}, "Wednesday": {"close": "17:00", "open": "08:00"}, "Thursday": {"close": "17:00", "open": "08:00"}}, "open": true, "categories": ["Doctors", "Health & Medical"], "city": "Phoenix", "review_count": 9, "name": "Eric Goldberg, MD", "neighborhoods": [], "longitude": -111.98375799999999, "state": "AZ", "stars": 3.5, "latitude": 33.499313000000001, "attributes": {"By Appointment Only": true}, "type": "business"}
我一直在使用 http://support.sas.com/resources/papers/proceedings13/296-2013.pdf 推荐的方法。
问题是并非所有观察结果都具有相同的条目。例如,某些观测值可能会丢失 "full_address".
因此我的示例代码现在是
filename data '(filename)';
data datatest; * defines dataset;
infile data lrecl = 32000 truncover scanover;
input
@'"business_id": "' business_id 5.
;
business_id = substr(business_id,1,index(business_id,'",')-1);
IF INDEX(_INFILE_,'"full_address":') > 0
THEN DO;
input @'"full_address": "' full_address 5.;
full_address = substr(full_address,1,index(full_address,'",')-1);
END;
run;
proc print data = work.datatest;
run;
问题是代码似乎跳过了所有其他观察。我怎样才能防止这种情况发生?
您的问题是您的初始输入超过了 full_address(因为它占用了 255 个字符)。您可以像这样解决这个问题:
input
@'"business_id": "' business_id 5. +(-254) @
;
基本上将指针重置回字段的开头并允许您查找下一部分。
您也可以换一种方式阅读本文;如果你有 SAS 9.3(我相信),PROC GROOVY
可以用来以更简单的方式读入 JSON 文件。有关详细信息,请参阅我对 this question 的回答。