无法在 HDP 的 Hive 查询中使用 mongo-hadoop 连接器
Not able to use mongo-hadoop connector in Hive query in HDP
我是 hadoop 新手。我已经安装了 hortonworks 沙箱 2.1。
我正在尝试使用 Hive UI 执行 Hive 脚本。我想访问 Hive 中的 mongo 集合。我为此使用了以下查询:
CREATE TABLE individuals
(
id INT,
name STRING,
age INT,
city STRING,
hobby STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id"}')
TBLPROPERTIES('mongo.uri'='mongodb://<hostIP>:27017/test.test');
我添加了 mongo-java-driver-2.12.2.jar, mongo-hadoop-core-1.3.0.jar 和 mongo-hadoop-hive-1.3.0.jar 作为文件资源。
但是当我执行查询时,它失败并出现以下错误:
15/03/11 04:38:24 INFO exec.DDLTask: Use StorageHandler-supplied com.mongodb.hadoop.hive.BSONSerDe for table individuals
15/03/11 04:38:24 ERROR exec.DDLTask: java.lang.NoClassDefFoundError: com/mongodb/util/JSON
at com.mongodb.hadoop.hive.BSONSerDe.initialize(BSONSerDe.java:107)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:339)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:283)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:276)
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:626)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:593)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4194)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.execute(BeeswaxServiceImpl.java:349)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.run(BeeswaxServiceImpl.java:614)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.run(BeeswaxServiceImpl.java:603)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:356)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1537)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.run(BeeswaxServiceImpl.java:603)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
有人可以帮忙告诉我我在这里缺少什么吗?
提前致谢。
您需要映射 mongodb 集合中的所有项目,而不仅仅是“_id”:
CREATE TABLE individuals
(
id INT,
name STRING,
age INT,
city STRING,
hobby STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","name":"<corresponding name in your collection>", "age":"<same here>", etc...}')
TBLPROPERTIES('mongo.uri'='mongodb://<hostIP>:27017/test.test');
我是 hadoop 新手。我已经安装了 hortonworks 沙箱 2.1。 我正在尝试使用 Hive UI 执行 Hive 脚本。我想访问 Hive 中的 mongo 集合。我为此使用了以下查询:
CREATE TABLE individuals
(
id INT,
name STRING,
age INT,
city STRING,
hobby STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id"}')
TBLPROPERTIES('mongo.uri'='mongodb://<hostIP>:27017/test.test');
我添加了 mongo-java-driver-2.12.2.jar, mongo-hadoop-core-1.3.0.jar 和 mongo-hadoop-hive-1.3.0.jar 作为文件资源。 但是当我执行查询时,它失败并出现以下错误:
15/03/11 04:38:24 INFO exec.DDLTask: Use StorageHandler-supplied com.mongodb.hadoop.hive.BSONSerDe for table individuals
15/03/11 04:38:24 ERROR exec.DDLTask: java.lang.NoClassDefFoundError: com/mongodb/util/JSON
at com.mongodb.hadoop.hive.BSONSerDe.initialize(BSONSerDe.java:107)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:339)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:283)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:276)
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:626)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:593)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4194)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.execute(BeeswaxServiceImpl.java:349)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.run(BeeswaxServiceImpl.java:614)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.run(BeeswaxServiceImpl.java:603)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:356)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1537)
at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.run(BeeswaxServiceImpl.java:603)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
有人可以帮忙告诉我我在这里缺少什么吗?
提前致谢。
您需要映射 mongodb 集合中的所有项目,而不仅仅是“_id”:
CREATE TABLE individuals
(
id INT,
name STRING,
age INT,
city STRING,
hobby STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","name":"<corresponding name in your collection>", "age":"<same here>", etc...}')
TBLPROPERTIES('mongo.uri'='mongodb://<hostIP>:27017/test.test');