压缩Json hive外部table中的数据,查询时抛出异常?
Compress Json data in hive external table, at the time querying throwing exception?
我已经按照以下步骤创建了外部 tables
Hive > ADD JAR /usr/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar;
Hive > set hive.exec.compress.output=true;
Hive > set mapred.output.compress=true;
Hive> set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
Hive> set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec;
Hive > CREATE EXTERNAL TABLE Json (id BIGINT,created_at STRING,source STRING,favorited BOOLEAN) ROW FORMAT SERDE "com.cloudera.hive.serde.JSONSerDe"
LOCATION /user/cloudera/ jsonGZ ";
我通过执行以下命令
压缩了我的 Json 文件
“ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.6.0-cdh5.5.0.jar -Dmap.output.compress=true -Dmap.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec -input /user/cloudera/json/ -output /user/cloudera/jsonGZ “
然后当我 运行 “ select * from json; “
我收到以下错误:
“OK Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.map.JsonMappingException: Can not deserialize instance of java.util.LinkedHashMap out of VALUE_NUMBER_INT token at “
而且我还使用“org.apache.hive.hcatalog.data.JsonSerD”
创建了一个 table
Hive > ADD JAR /usr/lib/hive-hactalog/share/ hactalog/ hive-hactalog-core.jar;
Hive > CREATE EXTERNAL TABLE Json 1(id BIGINT,created_at STRING,source STRING,favorited BOOLEAN) ROW FORMAT SERDE "com.cloudera.hive.serde.JSONSerDe"
LOCATION /user/cloudera/ jsonGZ ";
然后当我 运行 “select * from json1;“
时,我收到以下错误:
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected" after using "org.apache.hive.hcatalog.core(hive-hcatalog-core-0.13.0.jar)"
我错过了什么吗?我该如何解决这个错误。
只需 gzip 文件并将它们按原样 (*.gz) 放入 table 位置。
gzip 文件名
我已经按照以下步骤创建了外部 tables
Hive > ADD JAR /usr/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar;
Hive > set hive.exec.compress.output=true;
Hive > set mapred.output.compress=true;
Hive> set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
Hive> set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec;
Hive > CREATE EXTERNAL TABLE Json (id BIGINT,created_at STRING,source STRING,favorited BOOLEAN) ROW FORMAT SERDE "com.cloudera.hive.serde.JSONSerDe"
LOCATION /user/cloudera/ jsonGZ ";
我通过执行以下命令
压缩了我的 Json 文件“ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.6.0-cdh5.5.0.jar -Dmap.output.compress=true -Dmap.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec -input /user/cloudera/json/ -output /user/cloudera/jsonGZ “
然后当我 运行 “ select * from json; “
我收到以下错误:
“OK Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.map.JsonMappingException: Can not deserialize instance of java.util.LinkedHashMap out of VALUE_NUMBER_INT token at “
而且我还使用“org.apache.hive.hcatalog.data.JsonSerD”
创建了一个 tableHive > ADD JAR /usr/lib/hive-hactalog/share/ hactalog/ hive-hactalog-core.jar;
Hive > CREATE EXTERNAL TABLE Json 1(id BIGINT,created_at STRING,source STRING,favorited BOOLEAN) ROW FORMAT SERDE "com.cloudera.hive.serde.JSONSerDe"
LOCATION /user/cloudera/ jsonGZ ";
然后当我 运行 “select * from json1;“
时,我收到以下错误:
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected" after using "org.apache.hive.hcatalog.core(hive-hcatalog-core-0.13.0.jar)"
我错过了什么吗?我该如何解决这个错误。
只需 gzip 文件并将它们按原样 (*.gz) 放入 table 位置。
gzip 文件名