Hive SELECT 语句创建一个结构数组
Hive SELECT statement to create an ARRAY of STRUCTS
我在 Hive 中选择结构数组时遇到问题。
我的来源 table 看起来像这样:
+-------------+--+
| field |
+-------------+--+
| id |
| fieldid |
| fieldlabel |
| fieldtype |
| answer_id |
| unitname |
+-------------+--+
这是调查数据,其中id是调查id,中间四个字段是响应数据,unitname是调查所属的业务单位。
我需要为每个调查 ID 的所有答案创建一个结构数组。我认为这行得通,但行不通:
select id,
array(
named_struct(
"field_id",
fieldid,
"field_label",
fieldlabel,
"field_type",
fieldtype,
"answer_id",
answer_id,)) as answers,
unitname
from new_answers;
returns 是每个调查答案 (field_id) 作为该答案的一个结构的数组,如下所示:
id | answers | unitname
1 | [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] | Location1
2 | [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069}] | Location2
2 | [{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070}] | Location2
2 | [{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] | Location2
但我需要做的是:
id | answers | unitname
1 | [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] | Location1
2 | [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},
{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},
{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] | Location2
我搜索了又搜索,但我找到的所有答案似乎都与使用 INSERT INTO....VALUES() 查询有关。我已经有了一个 table 结构;我只是无法让 ARRAY 达到应有的状态。
如有任何帮助,我们将不胜感激。
为了复制目的,如果需要:
CREATE TABLE `new_answers`(
`id` bigint,
`fieldid` bigint,
`fieldlabel` string,
`fieldtype` string,
`answer_id` bigint,
`unitname` string)
INSERT INTO new_answers VALUES
(1,175877,"Comment","COMMENT",8990947803,"Location1"),
(2,47824,"Language","MULTIPLE_CHOICE",8990950069,"Location2"),
(2,48187,"Language Type","MULTIPLE_CHOICE",8990950070,"Location2"),
(2,47829,"Trans #","TEXT",8990950071,"Location2");
您似乎正在寻找的功能是将结构收集到一个数组中。 Hive 带有两个用于将事物收集到数组中的函数:collect_set 和 collect_list。但是,这些函数仅适用于创建基本类型的数组。
brickhouse 项目 (https://github.com/klout/brickhouse/wiki/Downloads) 的 jar 提供了许多功能,包括收集复杂类型的能力。
add jar hdfs://path/to/your/jars/brickhouse-0.6.0.jar
然后您可以使用您喜欢的任何名称添加 collect
函数:
create temporary function collect_struct as 'brickhouse.udf.collect.CollectUDAF';
以下查询:
select id
, collect_struct(
named_struct(
"field_id", fieldid,
"field_label", fieldlabel,
"field_type", fieldtype,
"answer_id", answer_id)) as answers
, unitname
from new_answers
group by id, unitname
;
提供以下结果:
id answers unitname
1 [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] Location1
2 [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] Location2
select id,
collect_list(
named_struct(
"field_id", fieldid,
"field_label", fieldlabel,
"field_type", fieldtype,
"answer_id", answer_id,)
) as answers,
unitname
from new_answers
group by id, unitname;
collect_list用于创建ARRAY
named_struct用于创建复杂的结构。
我在 Hive 中选择结构数组时遇到问题。
我的来源 table 看起来像这样:
+-------------+--+
| field |
+-------------+--+
| id |
| fieldid |
| fieldlabel |
| fieldtype |
| answer_id |
| unitname |
+-------------+--+
这是调查数据,其中id是调查id,中间四个字段是响应数据,unitname是调查所属的业务单位。
我需要为每个调查 ID 的所有答案创建一个结构数组。我认为这行得通,但行不通:
select id,
array(
named_struct(
"field_id",
fieldid,
"field_label",
fieldlabel,
"field_type",
fieldtype,
"answer_id",
answer_id,)) as answers,
unitname
from new_answers;
returns 是每个调查答案 (field_id) 作为该答案的一个结构的数组,如下所示:
id | answers | unitname
1 | [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] | Location1
2 | [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069}] | Location2
2 | [{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070}] | Location2
2 | [{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] | Location2
但我需要做的是:
id | answers | unitname
1 | [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] | Location1
2 | [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},
{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},
{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] | Location2
我搜索了又搜索,但我找到的所有答案似乎都与使用 INSERT INTO....VALUES() 查询有关。我已经有了一个 table 结构;我只是无法让 ARRAY 达到应有的状态。
如有任何帮助,我们将不胜感激。
为了复制目的,如果需要:
CREATE TABLE `new_answers`(
`id` bigint,
`fieldid` bigint,
`fieldlabel` string,
`fieldtype` string,
`answer_id` bigint,
`unitname` string)
INSERT INTO new_answers VALUES
(1,175877,"Comment","COMMENT",8990947803,"Location1"),
(2,47824,"Language","MULTIPLE_CHOICE",8990950069,"Location2"),
(2,48187,"Language Type","MULTIPLE_CHOICE",8990950070,"Location2"),
(2,47829,"Trans #","TEXT",8990950071,"Location2");
您似乎正在寻找的功能是将结构收集到一个数组中。 Hive 带有两个用于将事物收集到数组中的函数:collect_set 和 collect_list。但是,这些函数仅适用于创建基本类型的数组。
brickhouse 项目 (https://github.com/klout/brickhouse/wiki/Downloads) 的 jar 提供了许多功能,包括收集复杂类型的能力。
add jar hdfs://path/to/your/jars/brickhouse-0.6.0.jar
然后您可以使用您喜欢的任何名称添加 collect
函数:
create temporary function collect_struct as 'brickhouse.udf.collect.CollectUDAF';
以下查询:
select id
, collect_struct(
named_struct(
"field_id", fieldid,
"field_label", fieldlabel,
"field_type", fieldtype,
"answer_id", answer_id)) as answers
, unitname
from new_answers
group by id, unitname
;
提供以下结果:
id answers unitname
1 [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] Location1
2 [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] Location2
select id,
collect_list(
named_struct(
"field_id", fieldid,
"field_label", fieldlabel,
"field_type", fieldtype,
"answer_id", answer_id,)
) as answers,
unitname
from new_answers
group by id, unitname;
collect_list用于创建ARRAY
named_struct用于创建复杂的结构。