如何使用 SQLContext 从 pyspark notebook 执行任意 sql?
How to executing arbitrary sql from pyspark notebook using SQLContext?
我正在尝试一个基本的测试用例,将数据从 dashDB 读取到 spark,然后再将其写回 dashDB。
步骤 1。首先在笔记本中,我读取了数据:
sqlContext = SQLContext(sc)
dashdata = sqlContext.read.jdbc(
url="jdbc:db2://bluemix05.bluforcloud.com:50000/BLUDB:user=****;password=****;",
table="GOSALES.BRANCH"
).cache()
步骤 2。然后从 dashDB 我创建目标 table:
DROP TABLE ****.FROM_SPARK;
CREATE TABLE ****.FROM_SPARK AS (
SELECT *
FROM GOSALES.BRANCH
) WITH NO DATA
步骤 3。最后,在笔记本中我将数据保存到 table:
from pyspark.sql import DataFrameWriter
writer = DataFrameWriter(dashdata)
dashdata = writer.jdbc(
url="jdbc:db2://bluemix05.bluforcloud.com:50000/BLUDB:user=****;password=****;",
table="****.FROM_SPARK"
)
问题:是否可以从 pyspark 运行 步骤 2 中的 sql?由于设置库所涉及的工作量,我看不出如何从 pyspark documentation. I don't want to use vanilla python 连接到 dashDB 来完成此操作。
使用ibmdbpy. See this brief demo。
使用 as_idadataframe(),您可以将数据帧作为 table 上传到 dashDB。
在此处添加了关键步骤,因为 Whosebug 不喜欢链接到答案:
步骤 1: 添加包含以下内容的单元格:
#!pip install --user future
#!pip install --user lazy
#!pip install --user jaydebeapi
#!pip uninstall --yes ibmdbpy
#!pip install ibmdbpy --user --no-deps
#!wget -O $HOME/.local/lib/python2.7/site-packages/ibmdbpy/db2jcc4.jar https://ibm.box.com/shared/static/lmhzyeslp1rqns04ue8dnhz2x7fb6nkc.zip
第 2 步: 然后从另一个笔记本单元格
from ibmdbpy import IdaDataBase
idadb = IdaDataBase('jdbc:db2://<dashdb server name>:50000/BLUDB:user=<dashdb user>;password=<dashdb pw>')
....
Yes,
You can create table in dashdb from Notebook.
Below is the code for Scala :
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql._
import org.apache.log4j.Logger
import org.apache.log4j.Level
import java.sql.Connection
import java.sql.DriverManager
import java.sql.SQLException
import com.ibm.db2.jcc._
import java.io._
val jdbcClassName="com.ibm.db2.jcc.DB2Driver"
val url="jdbc:db2://awh-yp-small02.services.dal.bluemix.net:50001/BLUDB:sslConnection=true;" // enter the hostip fromc connection settings
val user="<username>"
val password="<password>"
Class.forName(jdbcClassName)
val connection = DriverManager.getConnection(url, user, password)
val stmt = connection.createStatement()
stmt.executeUpdate("CREATE TABLE COL12345(" +
"month VARCHAR(82))")
stmt.close()
connection.commit()
connection.close()
我正在尝试一个基本的测试用例,将数据从 dashDB 读取到 spark,然后再将其写回 dashDB。
步骤 1。首先在笔记本中,我读取了数据:
sqlContext = SQLContext(sc)
dashdata = sqlContext.read.jdbc(
url="jdbc:db2://bluemix05.bluforcloud.com:50000/BLUDB:user=****;password=****;",
table="GOSALES.BRANCH"
).cache()
步骤 2。然后从 dashDB 我创建目标 table:
DROP TABLE ****.FROM_SPARK;
CREATE TABLE ****.FROM_SPARK AS (
SELECT *
FROM GOSALES.BRANCH
) WITH NO DATA
步骤 3。最后,在笔记本中我将数据保存到 table:
from pyspark.sql import DataFrameWriter
writer = DataFrameWriter(dashdata)
dashdata = writer.jdbc(
url="jdbc:db2://bluemix05.bluforcloud.com:50000/BLUDB:user=****;password=****;",
table="****.FROM_SPARK"
)
问题:是否可以从 pyspark 运行 步骤 2 中的 sql?由于设置库所涉及的工作量,我看不出如何从 pyspark documentation. I don't want to use vanilla python 连接到 dashDB 来完成此操作。
使用ibmdbpy. See this brief demo。
使用 as_idadataframe(),您可以将数据帧作为 table 上传到 dashDB。
在此处添加了关键步骤,因为 Whosebug 不喜欢链接到答案:
步骤 1: 添加包含以下内容的单元格:
#!pip install --user future
#!pip install --user lazy
#!pip install --user jaydebeapi
#!pip uninstall --yes ibmdbpy
#!pip install ibmdbpy --user --no-deps
#!wget -O $HOME/.local/lib/python2.7/site-packages/ibmdbpy/db2jcc4.jar https://ibm.box.com/shared/static/lmhzyeslp1rqns04ue8dnhz2x7fb6nkc.zip
第 2 步: 然后从另一个笔记本单元格
from ibmdbpy import IdaDataBase
idadb = IdaDataBase('jdbc:db2://<dashdb server name>:50000/BLUDB:user=<dashdb user>;password=<dashdb pw>')
....
Yes,
You can create table in dashdb from Notebook.
Below is the code for Scala :
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql._
import org.apache.log4j.Logger
import org.apache.log4j.Level
import java.sql.Connection
import java.sql.DriverManager
import java.sql.SQLException
import com.ibm.db2.jcc._
import java.io._
val jdbcClassName="com.ibm.db2.jcc.DB2Driver"
val url="jdbc:db2://awh-yp-small02.services.dal.bluemix.net:50001/BLUDB:sslConnection=true;" // enter the hostip fromc connection settings
val user="<username>"
val password="<password>"
Class.forName(jdbcClassName)
val connection = DriverManager.getConnection(url, user, password)
val stmt = connection.createStatement()
stmt.executeUpdate("CREATE TABLE COL12345(" +
"month VARCHAR(82))")
stmt.close()
connection.commit()
connection.close()