无法使用 JDBC 连接到 Spark thriftserver

Can't connect to Spark thriftserver using JDBC

我跟随 Spark instructions 开始了一个 thrift JDBC 服务器:

$ ./spark-2.1.1-bin-hadoop2.7/sbin/start-thriftserver.sh

我可以从直线连接到它:

$ ./spark-2.1.1-bin-hadoop2.7/bin/beeline -u 'jdbc:hive2://localhost:10000'
Connecting to jdbc:hive2://localhost:10000
log4j:WARN No appenders could be found for logger (org.apache.hive.jdbc.Utils).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Connected to: Spark SQL (version 2.1.1)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1.spark2 by Apache Hive
0: jdbc:hive2://localhost:10000>

但是,尝试使用 JDBC 和相同的连接字符串从 DataGrip 进行连接时,出现错误:

[2017-07-07 16:46:57] java.lang.ClassNotFoundException: org.apache.thrift.transport.TTransportException
[2017-07-07 16:46:57]   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
[2017-07-07 16:46:57]   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
[2017-07-07 16:46:57]   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
[2017-07-07 16:46:57]   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
[2017-07-07 16:46:57]   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
[2017-07-07 16:46:57]   at com.intellij.database.remote.jdbc.impl.RemoteDriverImpl.connect(RemoteDriverImpl.java:27)
[2017-07-07 16:46:57]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2017-07-07 16:46:57]   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[2017-07-07 16:46:57]   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2017-07-07 16:46:57]   at java.lang.reflect.Method.invoke(Method.java:498)
[2017-07-07 16:46:57]   at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:324)
[2017-07-07 16:46:57]   at sun.rmi.transport.Transport.run(Transport.java:200)
[2017-07-07 16:46:57]   at sun.rmi.transport.Transport.run(Transport.java:197)
[2017-07-07 16:46:57]   at java.security.AccessController.doPrivileged(Native Method)
[2017-07-07 16:46:57]   at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
[2017-07-07 16:46:57]   at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
[2017-07-07 16:46:57]   at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
[2017-07-07 16:46:57]   at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run[=14=](TCPTransport.java:683)
[2017-07-07 16:46:57]   at java.security.AccessController.doPrivileged(Native Method)
[2017-07-07 16:46:57]   at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
[2017-07-07 16:46:57]   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[2017-07-07 16:46:57]   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[2017-07-07 16:46:57]   at java.lang.Thread.run(Thread.java:745) (no stack trace)

我将 DataGrip 配置为使用 spark 安装文件夹中的 JDBC 库 hive-jdbc-1.2.1.spark2.jar

spark/jars 文件夹中的所有 *.jar 文件添加到 DataGrip 中的 "JDBC drivers" window 后,成功了!不确定哪些库是必需的,但反复试验告诉我其中很多是必需的!

在 spark 2.2.1 发行版中,您需要以下 jar:

commons-logging-1.1.3.jar
hadoop-common-2.7.3.jar
hive-exec-1.2.1.spark2.jar
hive-jdbc-1.2.1.spark2.jar
hive-metastore-1.2.1.spark2.jar
httpclient-4.5.2.jar
httpcore-4.4.4.jar
libthrift-0.9.3.jar
slf4j-api-1.7.16.jar
spark-hive-thriftserver_2.11-2.2.1.jar
spark-network-common_2.11-2.2.1.jar

在 Datagrip select class org.apache.hive.jdbc.HiveDriver 中将 Tx(事务控制)设置为 Manual(spark 不支持自动提交)。

您现在应该可以使用 url jdbc:hive2://hostname:10000/

进行连接

添加到 tukushan 答案。您可以只使用两个罐子来简化您的生活: hadoop-common-2.7.3.jar 来自 spark 发行版和 hive-jdbc-1.2.1-standalone.jar