从 Bluemix 上的 Spark 即服务 Python 笔记本访问 Compose PostgreSQL 数据库中的数据
Accessing data in a Compose PostgreSQL database from Spark as a Service Python notebook on Bluemix
我试图通过 IBM Bluemix 上的 Spark 即服务(使用 python 笔记本)访问 postgres 数据库中的数据。这是我的代码:
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlContext.load(source="jdbc",\
url="jdbc:postgresql://[publichost]:[port]/compose",\
dbtable="[tablename]")
df.take(2)
我得到的错误(在 df = 行期间)是:
Py4JJavaError: An error occurred while calling o42.load.
: java.sql.SQLException: No suitable driver found for jdbc:postgresql://host:port/compose
at java.sql.DriverManager.getConnection(DriverManager.java:700)
at java.sql.DriverManager.getConnection(DriverManager.java:219)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnector.apply(JDBCRDD.scala:188)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnector.apply(JDBCRDD.scala:181)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
at org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:60)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:507)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:785)
我可以更新这个驱动程序吗?任何建议或工作示例将不胜感激!
发生这种情况是因为您的 spark 服务实例中没有默认安装 postgresql 驱动程序。
您需要先添加它才能使用它。
Change the kernel to Scala from the menu to execute below statement, you only need to execute this once per spark instance and then subsequent use postgres driver irrespective of kernel type(Python,Scala,R), you can simply import it
In [1]:
%Addjar -f https://jdbc.postgresql.org/download/postgresql-9.4.1207.jre7.jar
Starting download from https://jdbc.postgresql.org/download/postgresql-9.4.1207.jre7.jar
Finished download of postgresql-9.4.1207.jre7.jar
In [5]:
#Now change the kernel back to Python
In [1]:
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
In [3]:
#Ignore the Connection Error which is because of the invalid connection details
#Just simply change the publichost to your hostname and port number and databasename and
#tablename
In [4]:
df = sqlContext.load(source="jdbc",\
url="jdbc:postgresql://[publichost]:[port]/databasename",\
dbtable="[tablename]")
完整的可导入笔记本见下文
https://github.com/charles2588/bluemixsparknotebooks/raw/master/Python/python_postgres.ipynb
谢谢,
查尔斯.
我试图通过 IBM Bluemix 上的 Spark 即服务(使用 python 笔记本)访问 postgres 数据库中的数据。这是我的代码:
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlContext.load(source="jdbc",\
url="jdbc:postgresql://[publichost]:[port]/compose",\
dbtable="[tablename]")
df.take(2)
我得到的错误(在 df = 行期间)是:
Py4JJavaError: An error occurred while calling o42.load.
: java.sql.SQLException: No suitable driver found for jdbc:postgresql://host:port/compose
at java.sql.DriverManager.getConnection(DriverManager.java:700)
at java.sql.DriverManager.getConnection(DriverManager.java:219)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnector.apply(JDBCRDD.scala:188)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anonfun$getConnector.apply(JDBCRDD.scala:181)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
at org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:60)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:507)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:785)
我可以更新这个驱动程序吗?任何建议或工作示例将不胜感激!
发生这种情况是因为您的 spark 服务实例中没有默认安装 postgresql 驱动程序。
您需要先添加它才能使用它。
Change the kernel to Scala from the menu to execute below statement, you only need to execute this once per spark instance and then subsequent use postgres driver irrespective of kernel type(Python,Scala,R), you can simply import it
In [1]:
%Addjar -f https://jdbc.postgresql.org/download/postgresql-9.4.1207.jre7.jar
Starting download from https://jdbc.postgresql.org/download/postgresql-9.4.1207.jre7.jar
Finished download of postgresql-9.4.1207.jre7.jar
In [5]:
#Now change the kernel back to Python
In [1]:
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
In [3]:
#Ignore the Connection Error which is because of the invalid connection details
#Just simply change the publichost to your hostname and port number and databasename and
#tablename
In [4]:
df = sqlContext.load(source="jdbc",\
url="jdbc:postgresql://[publichost]:[port]/databasename",\
dbtable="[tablename]")
完整的可导入笔记本见下文 https://github.com/charles2588/bluemixsparknotebooks/raw/master/Python/python_postgres.ipynb
谢谢, 查尔斯.