如何 link s3 bucket 到 sagemaker notebook
how to link s3 bucket to sagemaker notebook
我正在尝试 link 我的 s3 存储桶到笔记本实例,但是我无法:
这是我知道的:
from sagemaker import get_execution_role
role = get_execution_role
bucket = 'atwinebankloadrisk'
datalocation = 'atwinebankloadrisk'
data_location = 's3://{}/'.format(bucket)
output_location = 's3://{}/'.format(bucket)
调用存储桶中的数据:
df_test = pd.read_csv(data_location/'application_test.csv')
df_train = pd.read_csv('./application_train.csv')
df_bureau = pd.read_csv('./bureau_balance.csv')
但是我不断收到错误消息,无法继续。
我还没有找到可以提供很大帮助的答案。
PS:我是这个 AWS 的新手
您正在尝试使用 Pandas 从 S3 读取文件 - Pandas 可以从本地磁盘读取文件,但不能直接从 S3 读取文件。
相反,download the files from S3 to your local disk, 然后使用 Pandas 阅读它们。
import boto3
import botocore
BUCKET_NAME = 'my-bucket' # replace with your bucket name
KEY = 'my_image_in_s3.jpg' # replace with your object key
s3 = boto3.resource('s3')
try:
# download as local file
s3.Bucket(BUCKET_NAME).download_file(KEY, 'my_local_image.jpg')
# OR read directly to memory as bytes:
# bytes = s3.Object(BUCKET_NAME, KEY).get()['Body'].read()
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
您可以使用 https://s3fs.readthedocs.io/en/latest/ to read s3 files directly with pandas. The code below is taken from here
import os
import pandas as pd
from s3fs.core import S3FileSystem
os.environ['AWS_CONFIG_FILE'] = 'aws_config.ini'
s3 = S3FileSystem(anon=False)
key = 'path\to\your-csv.csv'
bucket = 'your-bucket-name'
df = pd.read_csv(s3.open('{}/{}'.format(bucket, key), mode='rb'))
您可以使用下面的示例代码将 S3 数据加载到 AWS SageMaker 笔记本中。请确保 Amazon SageMaker 角色附加了策略以访问 S3。
[1] https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html
import boto3
import botocore
import pandas as pd
from sagemaker import get_execution_role
role = get_execution_role()
bucket = 'Your_bucket_name'
data_key = your_data_file.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)
pd.read_csv(data_location)
import boto3
# files are referred as objects in S3.
# file name is referred as key name in S3
def write_to_s3(filename, bucket_name, key):
with open(filename,'rb') as f: # Read in binary mode
return boto3.Session().resource('s3').Bucket(bucket).Object(key).upload_fileobj(f)
# Simple call the write_to_s3 function with required argument
write_to_s3('file_name.csv',
bucket_name,
'file_name.csv')
在 pandas 1.0.5 中,如果您已经提供了对笔记本实例的访问权限,那么从 S3 读取 csv 就像这样简单 (https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#reading-remote-files):
df = pd.read_csv('s3://<bucket-name>/<filepath>.csv')
在笔记本设置过程中,我将 SageMakerFullAccess
策略附加到笔记本实例,授予它访问 S3 存储桶的权限。您也可以通过 IAM 管理控制台执行此操作。
如果您需要凭据,可以通过三种方式提供 (https://s3fs.readthedocs.io/en/latest/#credentials):
aws_access_key_id
、aws_secret_access_key
和 aws_session_token
环境变量
- 配置文件如
~/.aws/credentials
- 对于 EC2 上的节点,IAM 元数据提供程序
我正在尝试 link 我的 s3 存储桶到笔记本实例,但是我无法:
这是我知道的:
from sagemaker import get_execution_role
role = get_execution_role
bucket = 'atwinebankloadrisk'
datalocation = 'atwinebankloadrisk'
data_location = 's3://{}/'.format(bucket)
output_location = 's3://{}/'.format(bucket)
调用存储桶中的数据:
df_test = pd.read_csv(data_location/'application_test.csv')
df_train = pd.read_csv('./application_train.csv')
df_bureau = pd.read_csv('./bureau_balance.csv')
但是我不断收到错误消息,无法继续。 我还没有找到可以提供很大帮助的答案。
PS:我是这个 AWS 的新手
您正在尝试使用 Pandas 从 S3 读取文件 - Pandas 可以从本地磁盘读取文件,但不能直接从 S3 读取文件。
相反,download the files from S3 to your local disk, 然后使用 Pandas 阅读它们。
import boto3
import botocore
BUCKET_NAME = 'my-bucket' # replace with your bucket name
KEY = 'my_image_in_s3.jpg' # replace with your object key
s3 = boto3.resource('s3')
try:
# download as local file
s3.Bucket(BUCKET_NAME).download_file(KEY, 'my_local_image.jpg')
# OR read directly to memory as bytes:
# bytes = s3.Object(BUCKET_NAME, KEY).get()['Body'].read()
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
您可以使用 https://s3fs.readthedocs.io/en/latest/ to read s3 files directly with pandas. The code below is taken from here
import os import pandas as pd from s3fs.core import S3FileSystem os.environ['AWS_CONFIG_FILE'] = 'aws_config.ini' s3 = S3FileSystem(anon=False) key = 'path\to\your-csv.csv' bucket = 'your-bucket-name' df = pd.read_csv(s3.open('{}/{}'.format(bucket, key), mode='rb'))
您可以使用下面的示例代码将 S3 数据加载到 AWS SageMaker 笔记本中。请确保 Amazon SageMaker 角色附加了策略以访问 S3。
[1] https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html
import boto3
import botocore
import pandas as pd
from sagemaker import get_execution_role
role = get_execution_role()
bucket = 'Your_bucket_name'
data_key = your_data_file.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)
pd.read_csv(data_location)
import boto3
# files are referred as objects in S3.
# file name is referred as key name in S3
def write_to_s3(filename, bucket_name, key):
with open(filename,'rb') as f: # Read in binary mode
return boto3.Session().resource('s3').Bucket(bucket).Object(key).upload_fileobj(f)
# Simple call the write_to_s3 function with required argument
write_to_s3('file_name.csv',
bucket_name,
'file_name.csv')
在 pandas 1.0.5 中,如果您已经提供了对笔记本实例的访问权限,那么从 S3 读取 csv 就像这样简单 (https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#reading-remote-files):
df = pd.read_csv('s3://<bucket-name>/<filepath>.csv')
在笔记本设置过程中,我将 SageMakerFullAccess
策略附加到笔记本实例,授予它访问 S3 存储桶的权限。您也可以通过 IAM 管理控制台执行此操作。
如果您需要凭据,可以通过三种方式提供 (https://s3fs.readthedocs.io/en/latest/#credentials):
aws_access_key_id
、aws_secret_access_key
和aws_session_token
环境变量- 配置文件如
~/.aws/credentials
- 对于 EC2 上的节点,IAM 元数据提供程序