使用 aws Sagemaker 的 运行 训练作业出错

error to run training job with aws Sagemaker

我正在尝试通过 github 示例将我自己的 sickit-learn ML 模型与 SageMaker 结合使用。

python 代码如下:

# Define IAM role import boto3 
import re 
import os 
import numpy as np 
import pandas as pd 
from sagemaker import get_execution_role 
import sagemaker as sage from time 
import gmtime, strftime 
role = get_execution_role()

ess =  sage.Session()
account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/decision-trees-sample:latest'.format(account, region)


output_path="s3://output"

sess

tree = sage.estimator.Estimator(image,
                      role, 1, 'ml.c4.2xlarge',
                     output_path='s3-eu-west-1.amazonaws.com/output',
                    sagemaker_session=sess)

tree.fit("s3://output/iris.csv")

但是我得到这个错误:

INFO:sagemaker:Creating training-job with name: decision-trees-sample-2018-04-24-13-13-38-281

--------------------------------------------------------------------------- ClientError Traceback (most recent call last) in () 14 sagemaker_session=sess) 15 ---> 16 tree.fit("s3://inteldatastore-cyrine/iris.csv")

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name) 161 self.output_path = 's3://{}/'.format(self.sagemaker_session.default_bucket()) 162 --> 163 self.latest_training_job = _TrainingJob.start_new(self, inputs) 164 if wait: 165 self.latest_training_job.wait(logs=logs)

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in start_new(cls, estimator, inputs) 336 input_config=input_config, role=role, job_name=estimator._current_job_name, 337 output_config=output_config, resource_config=resource_config, --> 338 hyperparameters=hyperparameters, stop_condition=stop_condition) 339 340 return cls(estimator.sagemaker_session, estimator._current_job_name)

~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in train(self, image, input_mode, input_config, role, job_name, output_config, resource_config, hyperparameters, stop_condition) 242 LOGGER.info('Creating training-job with name: {}'.format(job_name)) 243 LOGGER.debug('train request: {}'.format(json.dumps(train_request, indent=4))) --> 244 self.sagemaker_client.create_training_job(**train_request) 245 246 def create_model(self, name, role, primary_container):

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs) 312 "%s() only accepts keyword arguments." % py_operation_name) 313 # The "self" in this scope is referring to the BaseClient. --> 314 return self._make_api_call(operation_name, kwargs) 315 316 _api_call.name = str(py_operation_name)

~/anaconda3/envs/python3/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params) 610 error_code = parsed_response.get("Error", {}).get("Code") 611 error_class = self.exceptions.from_code(error_code) --> 612 raise error_class(parsed_response, operation_name) 613 else: 614 return parsed_response

ClientError: An error occurred (AccessDeniedException) when calling the CreateTrainingJob operation: User: arn:aws:sts::307504647302:assumed-role/default/SageMaker is not authorized to perform: sagemaker:CreateTrainingJob on resource: arn:aws:sagemaker:eu-west-1:307504647302:training-job/decision-trees-sample-2018-04-24-13-13-38-281

你能帮我解决这个问题吗?

谢谢

您似乎无权访问资源

arn:aws:sagemaker:eu-west-1:307504647302:training-job/decision-trees-sample-2018-04-24-13-13-38-281

您能否检查资源 url 是否正确以及是否在安全组中设置了适当的权限。

我在开始使用 SageMaker 时遇到了类似的问题,所以我开发了这个开源项目 https://github.com/Kenza-AI/sagify (sagify),它是一个 CLI 工具,可以帮助您训练和部署自己的机器 Learning/Deep以非常简单的方式在 SageMaker 上学习模型。无论我使用什么库(Keras、Tensorflow、scikit-learn、LightFM、spacy 等),我都设法训练和部署了我的所有 ML 模型。本质上,您可以以经典的 Pythonic 方式指定所有依赖项,即在 requiments.txt 中,sagify 将读取它们并将它们安装在 Docker 图像上。然后,可以在 SageMaker 上执行此 Docker 映像以进行训练和部署。

此外,我在 sagify 文档 (https://kenza-ai.github.io/sagify/) 中指定了一个关于如何设置 AWS 账户以避免权限相关问题的一次性流程。

可能您正在使用 AWS Educate 账户。

目前您无法使用 SageMaker 服务通过 AWS Educate Starter 账户创建训练或建模作业。

目前,如果您想 use/deploy 使用 SageMaker 服务进行训练,您可以使用自己的个人 AWS 账户。

但是,您可以通过 AWS Educate 账户通过 SageMaker 继续使用 Jupyter 笔记本。