无法在配置单元中创建 table
Not able to create a table in hive
我正在尝试使用我在网上找到的以下模式在配置单元 3.0 中创建一个 table:
CREATE TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT< text : STRING, user : STRUCT<screen_name : STRING,name : STRING>, retweet_count : INT>,
entities STRUCT< urls : ARRAY<STRUT<expanded_url : STRING>>,
user_mentions : ARRAY<STRUCT<screen_name : STRING,name : STRING>>,
hashtags : ARRAY<STRUCT<text : STRING>>>,
text STRING,
user STRUCT< screen_name : STRING, name : STRING, friends_count : INT, followers_count : INT, statuses_count : INT, verified : BOOLEAN, utc_offset : INT, time_zone : STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JSONSerDe';
当我按下时输入 NoViableAltException。我是第一次使用配置单元,没有经验有人可以告诉我架构有什么问题吗?
User 是 Reserved keyword 如果我们在 hive 中使用关键字那么我们需要 用 `(反引号)
将关键字括起来
示例:
`user`
尝试使用下面的创建 table 语句
CREATE TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT< text : STRING, `user` : STRUCT<screen_name : STRING,name : STRING>, retweet_count : INT>,
entities STRUCT< urls : ARRAY<STRUCT<expanded_url : STRING>>,
user_mentions : ARRAY<STRUCT<screen_name : STRING,name : STRING>>,
hashtags : ARRAY<STRUCT<text : STRING>>>,
text STRING,
`user` STRUCT< screen_name : STRING, name : STRING, friends_count : INT, followers_count : INT, statuses_count : INT, verified : BOOLEAN, utc_offset : INT, time_zone : STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
Location '/user/flume/tweets/';
我可以用上面的 ddl 创建 table:
desc tweets;
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+--+
| col_name | data_type | comment |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+--+
| id | bigint | from deserializer |
| created_at | string | from deserializer |
| source | string | from deserializer |
| favorited | boolean | from deserializer |
| retweeted_status | struct<text:string,user:struct<screen_name:string,name:string>,retweet_count:int> | from deserializer |
| entities | struct<urls:array<struct<expanded_url:string>>,user_mentions:array<struct<screen_name:string,name:string>>,hashtags:array<struct<text:string>>> | from deserializer |
| text | string | from deserializer |
| user | struct<screen_name:string,name:string,friends_count:int,followers_count:int,statuses_count:int,verified:boolean,utc_offset:int,time_zone:string> | from deserializer |
| in_reply_to_screen_name | string | from deserializer |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+--+
UPDATE:
当我们 运行 select statement hive 在table 指向的目录(/user/hive/warehouse/tweets/) 然后根据您的 ddl 语句 读取这些数据,但在此目录中不存在案例数据,因此 select 语句未返回任何记录。
要解决此问题:
Option1. 将数据从 /user/flume/tweets/
移动到 /user/hive/warehouse/tweets/
目录然后您可以 select 来自 [=75] 的数据=].
`hadoop fs -mv /user/flume/tweets/ /user/hive/warehouse/tweets/`
(或)
Option2. 我们需要在 /user/flume/tweets/
这个目录之上创建配置单元 table 然后你就可以在推文中看到数据 table(为此使用上面的 create table 语句)。
我正在尝试使用我在网上找到的以下模式在配置单元 3.0 中创建一个 table:
CREATE TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT< text : STRING, user : STRUCT<screen_name : STRING,name : STRING>, retweet_count : INT>,
entities STRUCT< urls : ARRAY<STRUT<expanded_url : STRING>>,
user_mentions : ARRAY<STRUCT<screen_name : STRING,name : STRING>>,
hashtags : ARRAY<STRUCT<text : STRING>>>,
text STRING,
user STRUCT< screen_name : STRING, name : STRING, friends_count : INT, followers_count : INT, statuses_count : INT, verified : BOOLEAN, utc_offset : INT, time_zone : STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JSONSerDe';
当我按下时输入 NoViableAltException。我是第一次使用配置单元,没有经验有人可以告诉我架构有什么问题吗?
User 是 Reserved keyword 如果我们在 hive 中使用关键字那么我们需要 用 `(反引号)
将关键字括起来示例:
`user`
尝试使用下面的创建 table 语句
CREATE TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT< text : STRING, `user` : STRUCT<screen_name : STRING,name : STRING>, retweet_count : INT>,
entities STRUCT< urls : ARRAY<STRUCT<expanded_url : STRING>>,
user_mentions : ARRAY<STRUCT<screen_name : STRING,name : STRING>>,
hashtags : ARRAY<STRUCT<text : STRING>>>,
text STRING,
`user` STRUCT< screen_name : STRING, name : STRING, friends_count : INT, followers_count : INT, statuses_count : INT, verified : BOOLEAN, utc_offset : INT, time_zone : STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
Location '/user/flume/tweets/';
我可以用上面的 ddl 创建 table:
desc tweets;
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+--+
| col_name | data_type | comment |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+--+
| id | bigint | from deserializer |
| created_at | string | from deserializer |
| source | string | from deserializer |
| favorited | boolean | from deserializer |
| retweeted_status | struct<text:string,user:struct<screen_name:string,name:string>,retweet_count:int> | from deserializer |
| entities | struct<urls:array<struct<expanded_url:string>>,user_mentions:array<struct<screen_name:string,name:string>>,hashtags:array<struct<text:string>>> | from deserializer |
| text | string | from deserializer |
| user | struct<screen_name:string,name:string,friends_count:int,followers_count:int,statuses_count:int,verified:boolean,utc_offset:int,time_zone:string> | from deserializer |
| in_reply_to_screen_name | string | from deserializer |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+--+
UPDATE:
当我们 运行 select statement hive 在table 指向的目录(/user/hive/warehouse/tweets/) 然后根据您的 ddl 语句 读取这些数据,但在此目录中不存在案例数据,因此 select 语句未返回任何记录。
要解决此问题:
Option1. 将数据从 /user/flume/tweets/
移动到 /user/hive/warehouse/tweets/
目录然后您可以 select 来自 [=75] 的数据=].
`hadoop fs -mv /user/flume/tweets/ /user/hive/warehouse/tweets/`
(或)
Option2. 我们需要在 /user/flume/tweets/
这个目录之上创建配置单元 table 然后你就可以在推文中看到数据 table(为此使用上面的 create table 语句)。