Proc Transpose 的 SAS Hive SQL (Hadoop) 版本?
SAS Hive SQL (Hadoop) version of Proc Transpose?
我想知道 SAS Hive SQL (Hadoop) 中是否有 'Proc Transpose' 的版本?
否则我可以看到唯一的其他(冗长的)方法是创建许多单独的 table 然后重新组合在一起,我宁愿避免这种情况。
欢迎任何帮助!
示例 table 转置 > 打算将月份放在 table 的顶部,以便按月拆分费率:
+------+-------+----------+----------+-------+
| YEAR | MONTH | Geog | Category | Rates |
+------+-------+----------+----------+-------+
| 2018 | 1 | National | X | 32 |
| 2018 | 1 | National | Y | 43 |
| 2018 | 1 | National | Z | 47 |
| 2018 | 1 | Regional | X | 52 |
| 2018 | 1 | Regional | Y | 38 |
| 2018 | 1 | Regional | Z | 65 |
| 2018 | 2 | National | X | 63 |
| 2018 | 2 | National | Y | 14 |
| 2018 | 2 | National | Z | 34 |
| 2018 | 2 | Regional | X | 90 |
| 2018 | 2 | Regional | Y | 71 |
| 2018 | 2 | Regional | Z | 69 |
+------+-------+----------+----------+-------+
示例输出:
+------+----------+----------+----+----+
| YEAR | Geog | Category | 1 | 2 |
+------+----------+----------+----+----+
| 2018 | National | X | 32 | 63 |
| 2018 | National | Y | 43 | 14 |
| 2018 | National | Z | 47 | 34 |
| 2018 | Regional | X | 52 | 90 |
| 2018 | Regional | Y | 38 | 71 |
| 2018 | Regional | Z | 65 | 69 |
+------+----------+----------+----+----+
典型的 wallpaper SQL 转置(或旋转)技术是 group+transform to pivot case statements sub - group 聚合查询中的查询折叠子查询。该组代表单个结果数据透视行。
比如你的组是year,geog,category并且min
用来折叠:
proc sql;
create view want_pivot as
select year, geog, category
, min(rate_m1) as rate_m1
, min(rate_m2) as rate_m2
from
( select
year, geog, category
, case when month=1 then rates end as rate_m1
, case when month=2 then rates end as rate_m2
from have
)
group by year, geog, category
;
这里是相同的概念,更通用一点,其中数据在组内在详细级别重复,mean
用于折叠重复。
data have;
input id name $ value;
datalines;
1 a 1
1 a 2
1 a 3
1 b 2
1 c 3
2 a 2
2 d 4
2 b 5
3 e 1
run;
proc sql;
create view have_pivot as
select
id
, mean(a) as a
, mean(b) as b
, mean(c) as c
, mean(d) as d
, mean(e) as e
from
(
select
id
, case when name='a' then value end as a
, case when name='b' then value end as b
, case when name='c' then value end as c
, case when name='d' then value end as d
, case when name='e' then value end as e
from have
)
group by id
;
quit;
当先验未知列名时,您将需要编写一个代码生成器来传递所有数据以确定名称值,编写墙纸查询,该查询将对返回的数据执行第二次传递枢轴.
此外,许多现代数据库都有一个 PIVOT 子句,可以通过传递来利用它。
Hadoop Mania post "TRANSPOSE/PIVOT a Table in Hive" 以类似壁纸的方式展示了 collect_list
和 map
的用法:
select b.id, b.code, concat_ws('',b.p) as p, concat_ws('',b.q) as q, concat_ws('',b.r) as r, concat_ws('',b.t) as t from
(select id, code,
collect_list(a.group_map['p']) as p,
collect_list(a.group_map['q']) as q,
collect_list(a.group_map['r']) as r,
collect_list(a.group_map['t']) as t
from ( select
id, code,
map(key,value) as group_map
from test_sample
) a group by a.id, a.code) b;
如果您的样本数据集代表真实数据集,那么您可以使用如下所示的简单内部联接。年份地理和类别使得下面代码的独特组合应该有效。
select a.YEAR ,
a.Geog ,
a.Category ,
a.Rates ,
a.month as month_1,
b.month as month_2
from have a
inner join
have b
on a.year = b.year
and a.Geog = b.Geog
and a.Category = b.category
where a.month ne b.month;
我想知道 SAS Hive SQL (Hadoop) 中是否有 'Proc Transpose' 的版本?
否则我可以看到唯一的其他(冗长的)方法是创建许多单独的 table 然后重新组合在一起,我宁愿避免这种情况。
欢迎任何帮助!
示例 table 转置 > 打算将月份放在 table 的顶部,以便按月拆分费率:
+------+-------+----------+----------+-------+
| YEAR | MONTH | Geog | Category | Rates |
+------+-------+----------+----------+-------+
| 2018 | 1 | National | X | 32 |
| 2018 | 1 | National | Y | 43 |
| 2018 | 1 | National | Z | 47 |
| 2018 | 1 | Regional | X | 52 |
| 2018 | 1 | Regional | Y | 38 |
| 2018 | 1 | Regional | Z | 65 |
| 2018 | 2 | National | X | 63 |
| 2018 | 2 | National | Y | 14 |
| 2018 | 2 | National | Z | 34 |
| 2018 | 2 | Regional | X | 90 |
| 2018 | 2 | Regional | Y | 71 |
| 2018 | 2 | Regional | Z | 69 |
+------+-------+----------+----------+-------+
示例输出:
+------+----------+----------+----+----+
| YEAR | Geog | Category | 1 | 2 |
+------+----------+----------+----+----+
| 2018 | National | X | 32 | 63 |
| 2018 | National | Y | 43 | 14 |
| 2018 | National | Z | 47 | 34 |
| 2018 | Regional | X | 52 | 90 |
| 2018 | Regional | Y | 38 | 71 |
| 2018 | Regional | Z | 65 | 69 |
+------+----------+----------+----+----+
典型的 wallpaper SQL 转置(或旋转)技术是 group+transform to pivot case statements sub - group 聚合查询中的查询折叠子查询。该组代表单个结果数据透视行。
比如你的组是year,geog,category并且min
用来折叠:
proc sql;
create view want_pivot as
select year, geog, category
, min(rate_m1) as rate_m1
, min(rate_m2) as rate_m2
from
( select
year, geog, category
, case when month=1 then rates end as rate_m1
, case when month=2 then rates end as rate_m2
from have
)
group by year, geog, category
;
这里是相同的概念,更通用一点,其中数据在组内在详细级别重复,mean
用于折叠重复。
data have;
input id name $ value;
datalines;
1 a 1
1 a 2
1 a 3
1 b 2
1 c 3
2 a 2
2 d 4
2 b 5
3 e 1
run;
proc sql;
create view have_pivot as
select
id
, mean(a) as a
, mean(b) as b
, mean(c) as c
, mean(d) as d
, mean(e) as e
from
(
select
id
, case when name='a' then value end as a
, case when name='b' then value end as b
, case when name='c' then value end as c
, case when name='d' then value end as d
, case when name='e' then value end as e
from have
)
group by id
;
quit;
当先验未知列名时,您将需要编写一个代码生成器来传递所有数据以确定名称值,编写墙纸查询,该查询将对返回的数据执行第二次传递枢轴.
此外,许多现代数据库都有一个 PIVOT 子句,可以通过传递来利用它。
Hadoop Mania post "TRANSPOSE/PIVOT a Table in Hive" 以类似壁纸的方式展示了 collect_list
和 map
的用法:
select b.id, b.code, concat_ws('',b.p) as p, concat_ws('',b.q) as q, concat_ws('',b.r) as r, concat_ws('',b.t) as t from
(select id, code,
collect_list(a.group_map['p']) as p,
collect_list(a.group_map['q']) as q,
collect_list(a.group_map['r']) as r,
collect_list(a.group_map['t']) as t
from ( select
id, code,
map(key,value) as group_map
from test_sample
) a group by a.id, a.code) b;
如果您的样本数据集代表真实数据集,那么您可以使用如下所示的简单内部联接。年份地理和类别使得下面代码的独特组合应该有效。
select a.YEAR ,
a.Geog ,
a.Category ,
a.Rates ,
a.month as month_1,
b.month as month_2
from have a
inner join
have b
on a.year = b.year
and a.Geog = b.Geog
and a.Category = b.category
where a.month ne b.month;