Pyspark:从 AWS:S3 存储桶读取数据并写入 postgres table
Pyspark: Read data from AWS:S3 bucket and write to postgres table
我正在尝试从 S3 存储桶读取数据并想将其写入/加载到 postgres table。我的代码是-
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Read Multiple CSV Files').getOrCreate()
path = ['C://Projects/Sandbox/file2.csv']
files = spark.read.csv(path, sep=',',inferSchema=True, header=True)
df1 = files.toPandas()
from pyspark.sql import DataFrameWriter
my_writer = DataFrameWriter(df1)
mode = "overwrite"
url = ""
properties = {"user": "","password": "","driver": "org.postgresql.Driver"}
my_writer.write.jdbc(url=url, table="test_result", mode=mode, properties=properties)
在线
my_writer = DataFrameWriter(files)
它给出的错误是-
AttributeError: 'DataFrameWriter' object has no attribute 'write'
在线,当传递给 DataFrameWriter() 的参数时 -
my_writer = DataFrameWriter(df1)
AttributeError: 'DataFrame' object has no attribute 'sql_ctx'
我有没有做错anything/anywhere?
无需创建 DataFrameWriter
的新实例,spark 数据框已使用 write
属性公开此接口。您可以使用此属性通过 jdbc
connection
导出 csv 数据
# Read the data form source
files = spark.read.csv(path, sep=',', inferSchema=True, header=True)
# Write the data to destination using jdbc connection
files.write.jdbc(url=url, table="test_result", mode=mode, properties=properties)
如何修复现有代码?
使用 files
创建 DataFrameWriter
的新实例,然后使用 my_writer.jdbc
导出数据使用 jdbc
连接
my_writer = DataFrameWriter(files)
my_writer.jdbc(url=url, table="test_result", mode=mode, properties=properties)
# ^^^^^^ No need to use .write attribute
以下解决方案是正确的
spark = SparkSession.builder.appName('Read Multiple CSV Files').getOrCreate()
path = ['C://Projects/Sandbox/file2.csv']
files = spark.read.csv(path, sep=',',inferSchema=True, header=True)
df1 = files.toPandas()
from pyspark.sql import DataFrameWriter
my_writer = DataFrameWriter(df1)
mode = "overwrite"
url = ""
properties = {"user": "","password": "","driver": "org.postgresql.Driver"}
my_writer.jdbc(url=url, table="test_result", mode=mode, properties=properties)
我正在尝试从 S3 存储桶读取数据并想将其写入/加载到 postgres table。我的代码是-
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Read Multiple CSV Files').getOrCreate()
path = ['C://Projects/Sandbox/file2.csv']
files = spark.read.csv(path, sep=',',inferSchema=True, header=True)
df1 = files.toPandas()
from pyspark.sql import DataFrameWriter
my_writer = DataFrameWriter(df1)
mode = "overwrite"
url = ""
properties = {"user": "","password": "","driver": "org.postgresql.Driver"}
my_writer.write.jdbc(url=url, table="test_result", mode=mode, properties=properties)
在线
my_writer = DataFrameWriter(files)
它给出的错误是-
AttributeError: 'DataFrameWriter' object has no attribute 'write'
在线,当传递给 DataFrameWriter() 的参数时 -
my_writer = DataFrameWriter(df1)
AttributeError: 'DataFrame' object has no attribute 'sql_ctx'
我有没有做错anything/anywhere?
无需创建 DataFrameWriter
的新实例,spark 数据框已使用 write
属性公开此接口。您可以使用此属性通过 jdbc
connection
# Read the data form source
files = spark.read.csv(path, sep=',', inferSchema=True, header=True)
# Write the data to destination using jdbc connection
files.write.jdbc(url=url, table="test_result", mode=mode, properties=properties)
如何修复现有代码?
使用 files
创建 DataFrameWriter
的新实例,然后使用 my_writer.jdbc
导出数据使用 jdbc
连接
my_writer = DataFrameWriter(files)
my_writer.jdbc(url=url, table="test_result", mode=mode, properties=properties)
# ^^^^^^ No need to use .write attribute
以下解决方案是正确的
spark = SparkSession.builder.appName('Read Multiple CSV Files').getOrCreate()
path = ['C://Projects/Sandbox/file2.csv']
files = spark.read.csv(path, sep=',',inferSchema=True, header=True)
df1 = files.toPandas()
from pyspark.sql import DataFrameWriter
my_writer = DataFrameWriter(df1)
mode = "overwrite"
url = ""
properties = {"user": "","password": "","driver": "org.postgresql.Driver"}
my_writer.jdbc(url=url, table="test_result", mode=mode, properties=properties)