将多个文件从 Redshift 卸载到 S3
Unload multiple files from Redshift to S3
您好,我正在尝试将多个 tables 从 Redshift 卸载到特定的 S3 存储桶中,出现以下错误:
psycopg2.InternalError: Specified unload destination on S3 is not empty. Consider using a different bucket / prefix, manually removing the target files in S3, or using the ALLOWOVERWRITE option.
如果我在 unload_function 上添加 'allowoverwrite' 选项,它将在 table 之前覆盖并在 S3 中最后 table 卸载。
这是我给出的代码:
import psycopg2
def unload_data(r_conn, aws_iam_role, datastoring_path, region, table_name):
unload = '''unload ('select * from {}')
to '{}'
credentials 'aws_iam_role={}'
manifest
gzip
delimiter ',' addquotes escape parallel off '''.format(table_name, datastoring_path, aws_iam_role)
print ("Exporting table to datastoring_path")
cur = r_conn.cursor()
cur.execute(unload)
r_conn.commit()
def main():
host_rs = 'dataingestion.*********.us******2.redshift.amazonaws.com'
port_rs = '5439'
database_rs = '******'
user_rs = '******'
password_rs = '********'
rs_tables = [ 'Employee', 'Employe_details' ]
iam_role = 'arn:aws:iam::************:role/RedshiftCopyUnload'
s3_datastoring_path = 's3://mysamplebuck/'
s3_region = 'us_*****_2'
print ("Exporting from source")
src_conn = psycopg2.connect(host = host_rs,
port = port_rs,
database = database_rs,
user = user_rs,
password = password_rs)
print ("Connected to RS")
for i, tabe in enumerate(rs_tables):
if tabe[0] == tabe[-1]:
print("No files to read!")
unload_data(src_conn, aws_iam_role = iam_role, datastoring_path = s3_datastoring_path, region = s3_region, table_name = rs_tables[i])
print (rs_tables[i])
if __name__=="__main__":
main()
抱怨说您将数据保存到同一个目的地。
这就像将您计算机上的所有文件复制到同一目录一样 -- 会有文件被覆盖。
您应该将 datastoring_path
更改为 每个 table 都不同,例如:
.format(table_name, datastoring_path + '/' + table_name, aws_iam_role)
您好,我正在尝试将多个 tables 从 Redshift 卸载到特定的 S3 存储桶中,出现以下错误:
psycopg2.InternalError: Specified unload destination on S3 is not empty. Consider using a different bucket / prefix, manually removing the target files in S3, or using the ALLOWOVERWRITE option.
如果我在 unload_function 上添加 'allowoverwrite' 选项,它将在 table 之前覆盖并在 S3 中最后 table 卸载。
这是我给出的代码:
import psycopg2
def unload_data(r_conn, aws_iam_role, datastoring_path, region, table_name):
unload = '''unload ('select * from {}')
to '{}'
credentials 'aws_iam_role={}'
manifest
gzip
delimiter ',' addquotes escape parallel off '''.format(table_name, datastoring_path, aws_iam_role)
print ("Exporting table to datastoring_path")
cur = r_conn.cursor()
cur.execute(unload)
r_conn.commit()
def main():
host_rs = 'dataingestion.*********.us******2.redshift.amazonaws.com'
port_rs = '5439'
database_rs = '******'
user_rs = '******'
password_rs = '********'
rs_tables = [ 'Employee', 'Employe_details' ]
iam_role = 'arn:aws:iam::************:role/RedshiftCopyUnload'
s3_datastoring_path = 's3://mysamplebuck/'
s3_region = 'us_*****_2'
print ("Exporting from source")
src_conn = psycopg2.connect(host = host_rs,
port = port_rs,
database = database_rs,
user = user_rs,
password = password_rs)
print ("Connected to RS")
for i, tabe in enumerate(rs_tables):
if tabe[0] == tabe[-1]:
print("No files to read!")
unload_data(src_conn, aws_iam_role = iam_role, datastoring_path = s3_datastoring_path, region = s3_region, table_name = rs_tables[i])
print (rs_tables[i])
if __name__=="__main__":
main()
抱怨说您将数据保存到同一个目的地。
这就像将您计算机上的所有文件复制到同一目录一样 -- 会有文件被覆盖。
您应该将 datastoring_path
更改为 每个 table 都不同,例如:
.format(table_name, datastoring_path + '/' + table_name, aws_iam_role)