Impala JDBC spark 集群模式下的连接问题
Impala JDBC connection issue in spark cluster mode
Impala jdbc 连接抛出以下异常,而 运行 集群模式下的 spark 作业。 Spark 作业创建配置单元 table 并使用 JDBC 执行 impala table invalidate/refresh。同样的作业在spark客户端模式下执行成功。
java.sql.SQLException: [Simba][ImpalaJDBCDriver](500164) Error initialized or created transport for authentication: [Simba][ImpalaJDBCDriver](500169) Unable to connect to server: GSS initiate failed. at om.cloudera.hivecommon.api.HiveServer2ClientFactory.createTransport(Unknown Source)
at com.cloudera.hivecommon.api.HiveServer2ClientFactory.createClient(Unknown Source)
at com.cloudera.hivecommon.core.HiveJDBCCommonConnection.connect(Unknown Source)
at com.cloudera.impala.core.ImpalaJDBCConnection.connect(Unknown Source)
at com.cloudera.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source)
at com.cloudera.jdbc.common.AbstractDriver.connect(Unknown Source)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
protected def getImpalaConnection(impalaJdbcDriver: String, impalaJdbcUrl: String): Connection = {
if (impalaJdbcDriver.length() == 0) return null
try {
Class.forName(impalaJdbcDriver).newInstance
UserGroupInformation.getLoginUser.doAs(
new PrivilegedAction[Connection] {
override def run(): Connection = DriverManager.getConnection(impalaJdbcUrl)
}
)
} catch {
case e: Exception => {
println(e.toString() + " --> " + e.getStackTraceString)
throw e
}
} }
val impalaJdbcDriver = "com.cloudera.impala.jdbc41.Driver"
val impalaJdbcUrl = "jdbc:impala://<Impala_Host>:21050/default;AuthMech=1;SSL=1;KrbRealm=HOST.COM;KrbHostFQDN=_HOST;KrbServiceName=impala;REQUEST_POOL=xyz"
println("Start impala connection")
val impalaConnection = getImpalaConnection(impalaJdbcDriver,impalaJdbcUrl)
val result = impalaConnection.createStatement.executeQuery(s"SELECT COUNT(1) FROM testTable")
println("End impala connection")
构建厚 jar 并使用下面给定的 spark 提交命令。如果需要,您可以传递其他参数,如文件、罐子。
Spark 提交命令:
spark-submit --master yarn-cluster --keytab /home/testuser/testuser.keytab --principal testuser@host.COM --queue xyz--class com.dim.UpdateImpala
根据您的 spark 版本进行如下更改
对于 Spark1:UserGroupInformation.getLoginUser
对于 Spark2:UserGroupInformation.getCurrentUser
Impala jdbc 连接抛出以下异常,而 运行 集群模式下的 spark 作业。 Spark 作业创建配置单元 table 并使用 JDBC 执行 impala table invalidate/refresh。同样的作业在spark客户端模式下执行成功。
java.sql.SQLException: [Simba][ImpalaJDBCDriver](500164) Error initialized or created transport for authentication: [Simba][ImpalaJDBCDriver](500169) Unable to connect to server: GSS initiate failed. at om.cloudera.hivecommon.api.HiveServer2ClientFactory.createTransport(Unknown Source)
at com.cloudera.hivecommon.api.HiveServer2ClientFactory.createClient(Unknown Source)
at com.cloudera.hivecommon.core.HiveJDBCCommonConnection.connect(Unknown Source)
at com.cloudera.impala.core.ImpalaJDBCConnection.connect(Unknown Source)
at com.cloudera.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source)
at com.cloudera.jdbc.common.AbstractDriver.connect(Unknown Source)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
protected def getImpalaConnection(impalaJdbcDriver: String, impalaJdbcUrl: String): Connection = {
if (impalaJdbcDriver.length() == 0) return null
try {
Class.forName(impalaJdbcDriver).newInstance
UserGroupInformation.getLoginUser.doAs(
new PrivilegedAction[Connection] {
override def run(): Connection = DriverManager.getConnection(impalaJdbcUrl)
}
)
} catch {
case e: Exception => {
println(e.toString() + " --> " + e.getStackTraceString)
throw e
}
} }
val impalaJdbcDriver = "com.cloudera.impala.jdbc41.Driver"
val impalaJdbcUrl = "jdbc:impala://<Impala_Host>:21050/default;AuthMech=1;SSL=1;KrbRealm=HOST.COM;KrbHostFQDN=_HOST;KrbServiceName=impala;REQUEST_POOL=xyz"
println("Start impala connection")
val impalaConnection = getImpalaConnection(impalaJdbcDriver,impalaJdbcUrl)
val result = impalaConnection.createStatement.executeQuery(s"SELECT COUNT(1) FROM testTable")
println("End impala connection")
构建厚 jar 并使用下面给定的 spark 提交命令。如果需要,您可以传递其他参数,如文件、罐子。
Spark 提交命令:
spark-submit --master yarn-cluster --keytab /home/testuser/testuser.keytab --principal testuser@host.COM --queue xyz--class com.dim.UpdateImpala
根据您的 spark 版本进行如下更改
对于 Spark1:UserGroupInformation.getLoginUser
对于 Spark2:UserGroupInformation.getCurrentUser