pyspark.sql.utils.AnalysisException:Table 未找到:test_result;
pyspark.sql.utils.AnalysisException: Table not found: test_result;
我正在尝试使用 pyspark 从 S3 Bucket 读取文件并将数据帧写入 postgresql table- 但遇到以下错误
代码:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('sample_v2').getOrCreate()
path = ['s3a://path/sample_data.csv']
df = spark.read.csv(path, sep=',',inferSchema=True, header=True)
print(df.show()) #works until here, df has data
df.write.format("jdbc").option("driver","org.postgresql.Driver").option("url","jdbc:postgres://********************rds.amazonaws.com:5432;database=abc;user=abcde;password=abcdef").insertInto("test_result")
错误:
22/04/06 12:15:31 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
22/04/06 12:15:31 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
22/04/06 12:15:34 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
22/04/06 12:15:34 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore UNKNOWN@192.168.29.14
22/04/06 12:15:34 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Softwares\spark-3.2.1-bin-hadoop3.2\spark\python\pyspark\sql\readwriter.py", line 762, in insertInto
self._jwrite.insertInto(tableName)
File "C:\Softwares\spark-3.2.1-bin-hadoop3.2\spark\python\lib\py4j-0.10.9.3-src.zip\py4j\java_gateway.py", line 1321, in __call__
File "C:\Softwares\spark-3.2.1-bin-hadoop3.2\spark\python\pyspark\sql\utils.py", line 117, in deco
raise converted from None
pyspark.sql.utils.AnalysisException: Table not found: test_result;
'InsertIntoStatement 'UnresolvedRelation [test_result], [], false, false, false
如何解决?
你应该使用 jdbc:postgresql://
而不是 jdbc:postgres://
。
错误显示“pyspark.sql.utils.AnalysisException:Table 未找到:test_result;”但问题可能出在从 spark 到 Postgres 系统的连接建立上。
df1=spark.read.format("jdbc").option("driver","org.postgresql.Driver").option("url","jdbc:postgresql://********************rds.amazonaws.com:5432;database=abc;user=abcde;password=abcdef").option("query", "select 1").load()
df1.show()
如果上述语句给出结果,则没有连接问题,问题可能有所不同,例如用户无权访问 table。
我尝试使用上述语法,但在建立与 postgress 的连接时出现错误 org.postgresql.util.PSQLException: The server requested password-based authentication, but no password was provided.
,因此我使用以下语法写入 Postgres
从数据库读取
source_db_url = "jdbc:postgresql://xxxxxxxxxxxx.rds.amazonaws.com:5432/database"
db_driver="org.postgresql.Driver"
db_user = "admin"
db_password = "admin"
df1=spark.read.format("jdbc").option("driver", db_driver).option("url", source_db_url).option("query", "select 1").option("user", db_user).option("password", db_password).load()
df1.show()
写入数据库
source_db_url = "jdbc:postgresql://xxxxxxxxxxxx.rds.amazonaws.com:5432/database"
db_driver="org.postgresql.Driver"
db_user = "admin"
db_password = "admin"
df1.write.format("jdbc").mode("Overwrite").option("truncate", "true").option("driver", db_driver).option("url", source_db_url).option("dbtable", "public.test_result").option("user", db_user).option("password", db_password).save()
如果您遇到错误 py4j.protocol.Py4JJavaError: An error occurred while calling o98.save. : java.lang.ClassNotFoundException: org.postgresql.Driver
从 https://jdbc.postgresql.org/download.html
下载 PostgreSQL JDBC 驱动程序,然后将数据库配置值替换为您的值。
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config("spark.jars", "/path_to_postgresDriver/postgresql-42.2.5.jar") \
.getOrCreate()
整个游戏的语法-
df.write.format("jdbc").option("driver", "org.postgresql.Driver").option("url","jdbc:postgresql://*************************ast-1.rds.a mazonaws.com/dbname").option("port","5432").option("dbtable","public.table_name").option("user","abc").option("password","abc").save()
我正在尝试使用 pyspark 从 S3 Bucket 读取文件并将数据帧写入 postgresql table- 但遇到以下错误
代码:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('sample_v2').getOrCreate()
path = ['s3a://path/sample_data.csv']
df = spark.read.csv(path, sep=',',inferSchema=True, header=True)
print(df.show()) #works until here, df has data
df.write.format("jdbc").option("driver","org.postgresql.Driver").option("url","jdbc:postgres://********************rds.amazonaws.com:5432;database=abc;user=abcde;password=abcdef").insertInto("test_result")
错误:
22/04/06 12:15:31 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
22/04/06 12:15:31 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
22/04/06 12:15:34 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
22/04/06 12:15:34 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore UNKNOWN@192.168.29.14
22/04/06 12:15:34 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Softwares\spark-3.2.1-bin-hadoop3.2\spark\python\pyspark\sql\readwriter.py", line 762, in insertInto
self._jwrite.insertInto(tableName)
File "C:\Softwares\spark-3.2.1-bin-hadoop3.2\spark\python\lib\py4j-0.10.9.3-src.zip\py4j\java_gateway.py", line 1321, in __call__
File "C:\Softwares\spark-3.2.1-bin-hadoop3.2\spark\python\pyspark\sql\utils.py", line 117, in deco
raise converted from None
pyspark.sql.utils.AnalysisException: Table not found: test_result;
'InsertIntoStatement 'UnresolvedRelation [test_result], [], false, false, false
如何解决?
你应该使用 jdbc:postgresql://
而不是 jdbc:postgres://
。
错误显示“pyspark.sql.utils.AnalysisException:Table 未找到:test_result;”但问题可能出在从 spark 到 Postgres 系统的连接建立上。
df1=spark.read.format("jdbc").option("driver","org.postgresql.Driver").option("url","jdbc:postgresql://********************rds.amazonaws.com:5432;database=abc;user=abcde;password=abcdef").option("query", "select 1").load()
df1.show()
如果上述语句给出结果,则没有连接问题,问题可能有所不同,例如用户无权访问 table。
我尝试使用上述语法,但在建立与 postgress 的连接时出现错误 org.postgresql.util.PSQLException: The server requested password-based authentication, but no password was provided.
,因此我使用以下语法写入 Postgres
从数据库读取
source_db_url = "jdbc:postgresql://xxxxxxxxxxxx.rds.amazonaws.com:5432/database"
db_driver="org.postgresql.Driver"
db_user = "admin"
db_password = "admin"
df1=spark.read.format("jdbc").option("driver", db_driver).option("url", source_db_url).option("query", "select 1").option("user", db_user).option("password", db_password).load()
df1.show()
写入数据库
source_db_url = "jdbc:postgresql://xxxxxxxxxxxx.rds.amazonaws.com:5432/database"
db_driver="org.postgresql.Driver"
db_user = "admin"
db_password = "admin"
df1.write.format("jdbc").mode("Overwrite").option("truncate", "true").option("driver", db_driver).option("url", source_db_url).option("dbtable", "public.test_result").option("user", db_user).option("password", db_password).save()
如果您遇到错误 py4j.protocol.Py4JJavaError: An error occurred while calling o98.save. : java.lang.ClassNotFoundException: org.postgresql.Driver
从 https://jdbc.postgresql.org/download.html
下载 PostgreSQL JDBC 驱动程序,然后将数据库配置值替换为您的值。
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config("spark.jars", "/path_to_postgresDriver/postgresql-42.2.5.jar") \
.getOrCreate()
整个游戏的语法-
df.write.format("jdbc").option("driver", "org.postgresql.Driver").option("url","jdbc:postgresql://*************************ast-1.rds.a mazonaws.com/dbname").option("port","5432").option("dbtable","public.table_name").option("user","abc").option("password","abc").save()