亚马逊 aws sagemaker 中的随机森林？

Randomforest in amazon aws sagemaker?

我希望重新创建本地构建的随机森林模型，并通过 sagemaker 部署它。该模型非常基础，但为了比较，我想在 sagemaker 中使用相同的模型。我在 sagemaker 的内置算法中没有看到 randomforest（这看起来很奇怪）- 是我走 deploying my own custom model 路线的唯一选择吗？仍在学习有关容器的知识，对于在本地进行简单的 randomforestclassifier() 调用的东西，似乎需要做很多工作。我只想以开箱即用的随机森林模型为基准，并证明它在通过 AWS sagemaker 部署时的工作原理相同。

RandomForestClassifier 不受 SageMaker 开箱即用的支持，但 XGBoost (gradient boosted trees) as well as decisionTreeClassifier from scikit-learn 均受支持。您可以直接从 SageMaker SDK 访问 scikit-learn 的 decisionTreeClassifier()。

这是一个演示如何使用 SageMaker's built-in scikit-learn.

中的 decisionTreeClassifier 的笔记本

通过 Dockerfile 部署您自己的自定义模型当然也是可能的（乍一看可能令人生畏，但并不是那么糟糕），但我同意这对于简单算法来说并不是理想的选择这已经包含在 SageMaker 中 :)

编辑：在评论中讨论的原始答案中混合了随机森林和随机砍伐森林。 Random Cut Forest SageMaker 算法文档可在此处获取：https://docs.aws.amazon.com/sagemaker/latest/dg/randomcutforest.html

随机砍伐森林 (RCF) Jupyter 笔记本例如：https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/random_cut_forest/random_cut_forest.ipynb

编辑 03/30/2020：将 link 添加到 SageMaker Sklearn random forest demo

在 SageMaker 中，您有 3 个选项来编写科学代码：

Built-in算法
Open-source pre-written 容器（可用适用于 sklearn、tensorflow、pytorch、mxnet、chainer。凯拉斯可以写在 tensorflow 和 mxnet 容器中）
自带容器（以R为例）

在撰写本文时post built-in 库 中既没有随机森林分类器也没有回归器。有一种算法叫做 Random Cut Forest in the built-in library but it is an unsupervised algorithm for anomaly detection, a different use-case than the scikit-learn random forest used in a supervised fashion (also ). But it is easy to use the open-source pre-written scikit-learn container to implement your own. There is a demo showing how to use Sklearn's random forest in SageMaker, with training orchestration bother from the high-level SDK and boto3. You can also use this other public sklearn-on-sagemaker demo 并更改模型。 pre-written 容器优于 "Bring your own" 选项的一个好处是 dockerfile 已经写入，Web 服务堆栈也已写入。

关于随机森林未出现在 built-in 算法中这一点让您感到惊讶，该库及其 18 个算法已经涵盖了丰富的 use-cases 集。例如，对于结构化数据的监督学习（随机森林通常 use-case），如果你想坚持 built-in，这取决于你的优先级（准确性、推理延迟、训练规模、成本...）您可以使用 SageMaker XGBoost（XGBoost 已经赢得了大量的数据挖掘比赛 - KDDcup 2015 前 10 名的每个获胜团队都使用 XGBoost according to the XGBoost paper - and scales well) and linear learner, which is extremely fast at inference and can be trained at scale, in mini-batch fashion over GPU(s). Factorization Machines (linear + 2nd degree interaction with weights being column embedding dot-products) and SageMaker kNN 是其他选择。此外，事情并没有一成不变，并且built-in 算法列表正在快速改进。

亚马逊 aws sagemaker 中的随机森林？

Randomforest in amazon aws sagemaker?

containers

amazon-web-services

random-forest

docker

amazon-sagemaker