Pyspark 天蓝色 SQL
Pyspark azuresql
我在 mac os 上使用 pyspark 并尝试从 AzureSQL 中读取,但我得到了下面所说的错误
Spark 2.4.6;
斯卡拉 2.11 ; java1.8.0_251
pyspark --jars spark-mssql-connector_2.11_2.4-1.0.2.jar
dbname = "db-test"
servername = "jdbc:sqlserver://" + "samplesql.database.windows.net:1433"
url = servername + ";" + "database_name=" + dbname + ";"
df = spark.read
.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", url) \
.option("dbtable", table_name) \
.option("authentication", "ActiveDirectoryPassword") \
.option("hostNameInCertificate", "*.database.windows.net") \
.option("user", aduser) \
.option("password", adpwd) \
.option("encrypt", "true").load()
: java.sql.SQLException: No suitable driver
at java.sql.DriverManager.getDriver(DriverManager.java:315)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun.apply(JDBCOptions.scala:105)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun.apply(JDBCOptions.scala:105)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:104)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:35)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:32)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
我在这里错过了什么?
尝试使用 --packages
标志。
pyspark --packages com.microsoft.azure:spark-mssql-connector_2.11_2.4:1.0.2
我在 mac os 上使用 pyspark 并尝试从 AzureSQL 中读取,但我得到了下面所说的错误
Spark 2.4.6; 斯卡拉 2.11 ; java1.8.0_251
pyspark --jars spark-mssql-connector_2.11_2.4-1.0.2.jar
dbname = "db-test"
servername = "jdbc:sqlserver://" + "samplesql.database.windows.net:1433"
url = servername + ";" + "database_name=" + dbname + ";"
df = spark.read
.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", url) \
.option("dbtable", table_name) \
.option("authentication", "ActiveDirectoryPassword") \
.option("hostNameInCertificate", "*.database.windows.net") \
.option("user", aduser) \
.option("password", adpwd) \
.option("encrypt", "true").load()
: java.sql.SQLException: No suitable driver
at java.sql.DriverManager.getDriver(DriverManager.java:315)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun.apply(JDBCOptions.scala:105)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions$$anonfun.apply(JDBCOptions.scala:105)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:104)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:35)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:32)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
我在这里错过了什么?
尝试使用 --packages
标志。
pyspark --packages com.microsoft.azure:spark-mssql-connector_2.11_2.4:1.0.2