AWS Glue Python Shell 包导入
AWS Glue Python Shell package import
我们创建了一个连接 Redshift 和获取数据的 python shell 作业,下面的程序在我的本地系统中运行良好。
下面是步骤和程序。
节目:-
import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker
#>>>>>>>> MAKE CHANGES HERE <<<<<<<<<<<<<
DATABASE = "#####"
USER = "#####"
PASSWORD = "#####"
HOST = "#####.redshift.amazonaws.com"
PORT = "5439"
SCHEMA = "test" #default is "public"
####### connection and session creation ##############
connection_string = "redshift+psycopg2://%s:%s@%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = "SET search_path TO %s" % SCHEMA
s.execute(SetPath)
###### All Set Session created using provided schema #######
################ write queries from here ######################
query = "SELECT * FROM test1 limit 2;"
rr = s.execute(query)
all_results = rr.fetchall()
def pretty(all_results):
for row in all_results :
print("row start >>>>>>>>>>>>>>>>>>>>")
for r in row :
print(" ----" , r)
print("row end >>>>>>>>>>>>>>>>>>>>>>")
pretty(all_results)
########## close session in the end ###############
s.close()
步骤:-
- sudo pip 安装 psycopg2
- sudo pip 安装 sqlalchemy
- sudo pip 安装 sqlalchemy-redshift
我上传了文件 psycopg2-2.8.4-cp27-cp27m-win32.whl, Flask_SQLAlchemy-2.4.1-py2.py3-none-any.whl 和sqlalchemy_redshift-0.7.5-py2.py3-none-any.whl in S3 (s3://####/lib/),映射文件夹在Python AWS Glue 作业中的库路径。
当我运行下面的程序出现错误。
Traceback (most recent call last):
File "/tmp/runscript.py", line 113, in <module>
download_and_install(args.extra_py_files)
File "/tmp/runscript.py", line 56, in download_and_install
download_from_s3(s3_file_path, local_file_path)
File "/tmp/runscript.py", line 81, in download_from_s3
s3.download_file(bucket_name, s3_key, new_file_path)
File "/usr/local/lib/python2.7/site-packages/boto3/s3/inject.py", line 172, in download_file
extra_args=ExtraArgs, callback=Callback)
File "/usr/local/lib/python2.7/site-packages/boto3/s3/transfer.py", line 307, in download_file
future.result()
File "/usr/local/lib/python2.7/site-packages/s3transfer/futures.py", line 106, in result
return self._coordinator.result()
File "/usr/local/lib/python2.7/site-packages/s3transfer/futures.py", line 265, in result
raise self._exception
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
PS:- Glue 作业角色具有对 S3 的完全访问权限。
请建议如何将这些库与程序映射。
您可以在“—extra-py-files”标志下指定自己的 Python 库打包为 .egg 或 .whl 文件,如下例所示。
命令行示例:
aws glue create-job --name python-redshift-test-cli --role role --command '{"Name" : "pythonshell", "ScriptLocation" : "s3://MyBucket/python/library/redshift_test.py"}'
--connections Connections=connection-name --default-arguments '{"--extra-py-files" : ["s3://MyBucket/python/library/redshift_module-0.1-py2.7.egg", "s3://MyBucket/python/library/redshift_module-0.1-py2.7-none-any.whl"]}'
有一种使用 whl 文件导入 python 依赖项的简单方法,可以在特定模块的 Python 站点上找到。
您还可以使用逗号从 S3 添加多个 wheel 文件。
例如
"s3://xxxxxxxxx/common/glue/glue_whl/fastparquet-0.4.1-cp37-cp37m-macosx_10_9_x86_64.whl,s3://xxxxxx/common/glue/glue_whl/packaging-20.4-py2.py3-none-any.whl,s3://xxxxxx/common/glue/glue_whl/s3fs-0.5.0-py3-none-any.whl
enter image description here
我们创建了一个连接 Redshift 和获取数据的 python shell 作业,下面的程序在我的本地系统中运行良好。 下面是步骤和程序。
节目:-
import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker
#>>>>>>>> MAKE CHANGES HERE <<<<<<<<<<<<<
DATABASE = "#####"
USER = "#####"
PASSWORD = "#####"
HOST = "#####.redshift.amazonaws.com"
PORT = "5439"
SCHEMA = "test" #default is "public"
####### connection and session creation ##############
connection_string = "redshift+psycopg2://%s:%s@%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = "SET search_path TO %s" % SCHEMA
s.execute(SetPath)
###### All Set Session created using provided schema #######
################ write queries from here ######################
query = "SELECT * FROM test1 limit 2;"
rr = s.execute(query)
all_results = rr.fetchall()
def pretty(all_results):
for row in all_results :
print("row start >>>>>>>>>>>>>>>>>>>>")
for r in row :
print(" ----" , r)
print("row end >>>>>>>>>>>>>>>>>>>>>>")
pretty(all_results)
########## close session in the end ###############
s.close()
步骤:-
- sudo pip 安装 psycopg2
- sudo pip 安装 sqlalchemy
- sudo pip 安装 sqlalchemy-redshift
我上传了文件 psycopg2-2.8.4-cp27-cp27m-win32.whl, Flask_SQLAlchemy-2.4.1-py2.py3-none-any.whl 和sqlalchemy_redshift-0.7.5-py2.py3-none-any.whl in S3 (s3://####/lib/),映射文件夹在Python AWS Glue 作业中的库路径。
当我运行下面的程序出现错误。
Traceback (most recent call last):
File "/tmp/runscript.py", line 113, in <module>
download_and_install(args.extra_py_files)
File "/tmp/runscript.py", line 56, in download_and_install
download_from_s3(s3_file_path, local_file_path)
File "/tmp/runscript.py", line 81, in download_from_s3
s3.download_file(bucket_name, s3_key, new_file_path)
File "/usr/local/lib/python2.7/site-packages/boto3/s3/inject.py", line 172, in download_file
extra_args=ExtraArgs, callback=Callback)
File "/usr/local/lib/python2.7/site-packages/boto3/s3/transfer.py", line 307, in download_file
future.result()
File "/usr/local/lib/python2.7/site-packages/s3transfer/futures.py", line 106, in result
return self._coordinator.result()
File "/usr/local/lib/python2.7/site-packages/s3transfer/futures.py", line 265, in result
raise self._exception
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
PS:- Glue 作业角色具有对 S3 的完全访问权限。
请建议如何将这些库与程序映射。
您可以在“—extra-py-files”标志下指定自己的 Python 库打包为 .egg 或 .whl 文件,如下例所示。
命令行示例:
aws glue create-job --name python-redshift-test-cli --role role --command '{"Name" : "pythonshell", "ScriptLocation" : "s3://MyBucket/python/library/redshift_test.py"}'
--connections Connections=connection-name --default-arguments '{"--extra-py-files" : ["s3://MyBucket/python/library/redshift_module-0.1-py2.7.egg", "s3://MyBucket/python/library/redshift_module-0.1-py2.7-none-any.whl"]}'
有一种使用 whl 文件导入 python 依赖项的简单方法,可以在特定模块的 Python 站点上找到。
您还可以使用逗号从 S3 添加多个 wheel 文件。
例如 "s3://xxxxxxxxx/common/glue/glue_whl/fastparquet-0.4.1-cp37-cp37m-macosx_10_9_x86_64.whl,s3://xxxxxx/common/glue/glue_whl/packaging-20.4-py2.py3-none-any.whl,s3://xxxxxx/common/glue/glue_whl/s3fs-0.5.0-py3-none-any.whl
enter image description here