SASHELP 视图与 SQL 字典表的性能对比
Performance of SASHELP views versus SQL dictionary tables
为什么 SAS 使用 sashelp.vcolumn
从数据步骤视图创建数据集与等效的 SQL table dictionary.columns
相比需要更长的时间?
我使用 fullstimer
进行了测试,这似乎证实了我对性能差异的怀疑。
option fullstimer;
data test1;
set sashelp.vcolumn;
where libname = 'SASHELP' and
memname = 'CLASS' and
memtype = 'DATA';
run;
proc sql;
create table test2 as
select *
from dictionary.columns
where libname = 'SASHELP' and
memname = 'CLASS' and
memtype = 'DATA';
quit;
日志摘录:
NOTE: There were 5 observations read from the data set SASHELP.VCOLUMN.
WHERE (libname='SASHELP') and (memname='CLASS') and (memtype='DATA');
NOTE: The data set WORK.TEST1 has 5 observations and 18 variables.
NOTE: DATA statement used (Total process time):
real time 0.67 seconds
user cpu time 0.23 seconds
system cpu time 0.23 seconds
memory 3820.75k
OS Memory 24300.00k
Timestamp 04/13/2015 09:42:21 AM
Step Count 5 Switch Count 0
NOTE: Table WORK.TEST2 created, with 5 rows and 18 columns.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.03 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 3267.46k
OS Memory 24300.00k
Timestamp 04/13/2015 09:42:21 AM
Step Count 6 Switch Count 0
SASHELP 使用的内存稍高,但差别不大。请注意时间——使用 SASHELP 比使用 SQL 字典要长 22 倍。当然这不能只是因为内存使用量的相对较小的差异。
在@Salva 的建议下,我在新的 SAS 会话中重新提交了代码,这次是 运行 数据步骤之前的 SQL 步骤。内存和时间差异更明显:
| sql | sashelp
----------------+-----------+-----------
real time | 0.28 sec | 1.84 sec
user cpu time | 0.00 sec | 0.25 sec
system cpu time | 0.00 sec | 0.24 sec
memory | 3164.78k | 4139.53k
OS Memory | 10456.00k | 13292.00k
Step Count | 1 | 2
Switch Count | 0 | 0
部分(如果不是全部)是 SQL 和数据步骤之间的开销差异。例如:
proc sql;
create table test2 as
select *
from sashelp.vcolumn
where libname = 'SASHELP' and
memname = 'CLASS' and
memtype = 'DATA';
quit;
也很快。
SAS page about Dictionary Tables 提供了一些可能是主要解释的信息。
When querying a DICTIONARY table, SAS launches a discovery process
that gathers information that is pertinent to that table. Depending on
the DICTIONARY table that is being queried, this discovery process can
search libraries, open tables, and execute views. Unlike other SAS
procedures and the DATA step, PROC SQL can mitigate this process by
optimizing the query before the discovery process is launched.
Therefore, although it is possible to access DICTIONARY table
information with SAS procedures or the DATA step by using the SASHELP
views, it is often more efficient to use PROC SQL instead.
根据我的经验,使用 sashelp 视图比使用 proc 数据集慢。如果您分配了很多库,尤其是外部库,则更是如此:
10 proc datasets lib=sashelp noprint;
11 contents data=class out=work.test2;
12 quit;
NOTE: The data set WORK.TEST2 has 5 observations and 40 variables.
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.01 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
memory 635.12k
OS Memory 9404.00k
Timestamp 14.04.2015 kl 10.22
为什么 SAS 使用 sashelp.vcolumn
从数据步骤视图创建数据集与等效的 SQL table dictionary.columns
相比需要更长的时间?
我使用 fullstimer
进行了测试,这似乎证实了我对性能差异的怀疑。
option fullstimer;
data test1;
set sashelp.vcolumn;
where libname = 'SASHELP' and
memname = 'CLASS' and
memtype = 'DATA';
run;
proc sql;
create table test2 as
select *
from dictionary.columns
where libname = 'SASHELP' and
memname = 'CLASS' and
memtype = 'DATA';
quit;
日志摘录:
NOTE: There were 5 observations read from the data set SASHELP.VCOLUMN.
WHERE (libname='SASHELP') and (memname='CLASS') and (memtype='DATA');
NOTE: The data set WORK.TEST1 has 5 observations and 18 variables.
NOTE: DATA statement used (Total process time):
real time 0.67 seconds
user cpu time 0.23 seconds
system cpu time 0.23 seconds
memory 3820.75k
OS Memory 24300.00k
Timestamp 04/13/2015 09:42:21 AM
Step Count 5 Switch Count 0
NOTE: Table WORK.TEST2 created, with 5 rows and 18 columns.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.03 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 3267.46k
OS Memory 24300.00k
Timestamp 04/13/2015 09:42:21 AM
Step Count 6 Switch Count 0
SASHELP 使用的内存稍高,但差别不大。请注意时间——使用 SASHELP 比使用 SQL 字典要长 22 倍。当然这不能只是因为内存使用量的相对较小的差异。
在@Salva 的建议下,我在新的 SAS 会话中重新提交了代码,这次是 运行 数据步骤之前的 SQL 步骤。内存和时间差异更明显:
| sql | sashelp
----------------+-----------+-----------
real time | 0.28 sec | 1.84 sec
user cpu time | 0.00 sec | 0.25 sec
system cpu time | 0.00 sec | 0.24 sec
memory | 3164.78k | 4139.53k
OS Memory | 10456.00k | 13292.00k
Step Count | 1 | 2
Switch Count | 0 | 0
部分(如果不是全部)是 SQL 和数据步骤之间的开销差异。例如:
proc sql;
create table test2 as
select *
from sashelp.vcolumn
where libname = 'SASHELP' and
memname = 'CLASS' and
memtype = 'DATA';
quit;
也很快。
SAS page about Dictionary Tables 提供了一些可能是主要解释的信息。
When querying a DICTIONARY table, SAS launches a discovery process that gathers information that is pertinent to that table. Depending on the DICTIONARY table that is being queried, this discovery process can search libraries, open tables, and execute views. Unlike other SAS procedures and the DATA step, PROC SQL can mitigate this process by optimizing the query before the discovery process is launched. Therefore, although it is possible to access DICTIONARY table information with SAS procedures or the DATA step by using the SASHELP views, it is often more efficient to use PROC SQL instead.
根据我的经验,使用 sashelp 视图比使用 proc 数据集慢。如果您分配了很多库,尤其是外部库,则更是如此:
10 proc datasets lib=sashelp noprint;
11 contents data=class out=work.test2;
12 quit;
NOTE: The data set WORK.TEST2 has 5 observations and 40 variables.
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.01 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
memory 635.12k
OS Memory 9404.00k
Timestamp 14.04.2015 kl 10.22