SageMaker 中的预测 :::: 编写函数将大数据帧拆分成批次以进行预测

Predictions in SageMaker ::: Writing Function To Split Big Data-frame Into Batches For Predictions

我正在使用 amazon SageMaker 进行模型训练和预测。但是,我遇到了 InvokeEndpoint 的问题,每个请求仍然有 5MB 的限制。

但是,我有超过 100 万行作为不同的输入,我知道我应该考虑为每个输入发送单独的请求,或者将输入拆分为适合限制的一些批大小,并将每个批作为单独的请求(可能与同一端点并行)。

### Making predictions based on 1 dataframe of 500 rows
### aproximately 500 rows are ~500MB

    num_predict_rows <- 500 
    test_sample <- as.matrix(gender_test[1:num_predict_rows, ])
    dimnames(test_sample)[[2]] <- NULL

    library(stringr)
    predictions <- model_endpoint$predict(data_tbl_test)
    predictions <- str_split(predictions, pattern = ',', simplify = TRUE)
    predictions <- as.numedimnames(data_tbl_test)[[2]] <- NULLric(predictions)

    data_tbl_pred <- cbind(predicted_sample = predictions, data_tbl_test[1:num_predict_rows, ])

我的问题是

我如何编写一个函数

提前致谢

您可能需要调整它以根据需要构建输出,但如果我理解您的代码,这应该对每个 batch 进行预测,然后将结果存储在 all_preds.

library(stringr)

# some initialization
N <- NROW(data_tbl_test)
num_predict_rows <- 500 
n <- ceiling(N / num_predict_rows)
k <- 1   # This should be the number of columns in model_endpoint$predict(...)
all_preds = matrix(0, NROW(data_tbl_test), k)   # where the predictions will be stored

# get batch indices
ind <- rep(list(NULL), n)
for (i in 1:n)
    ind[[i]] <- seq((i-1)*500+1, min(i*500, N)) 

# predict on each batch
for (i in 1:n){
    batch = data_tbl_test[ind[[i]],]
    predictions <- model_endpoint$predict(batch)
    predictions <- str_split(predictions, pattern = ',', simplify = TRUE)
    predictions <- as.numedimnames(batch)[[2]] <- NULLric(predictions)
    all_preds[ind[[i]],] = predictions
    }

您是否考虑过使用 SageMaker Batch Transform 代替上述用例?它负责将数据从 S3 流式传输到推理容器,并支持几种拆分数据的方法。

请看 https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html for an overview. Also see https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-batch-code.html 如果您要带上自己的推理容器来了解细节。

一些示例笔记本:

  1. https://github.com/awslabs/amazon-sagemaker-examples/tree/4cc457faf4873c0ce674b6b5f857b5ee85967bf6/advanced_functionality/batch_transform

  2. https://github.com/awslabs/amazon-sagemaker-examples/blob/c80657daa9d42b7c9b12729d6fa4b825fd980730/sagemaker-python-sdk/scikit_learn_iris/Scikit-learn%20Estimator%20Example%20With%20Batch%20Transform.ipynb

  3. https://github.com/awslabs/amazon-sagemaker-examples/blob/7a2618a669a00b08458504c0055f0a13dd5ccfd7/sagemaker-python-sdk/mxnet_mnist/mxnet_mnist_with_batch_transform.ipynb

如果您有详细问题/需要特定转换作业的支持,请访问 AWS 论坛:https://forums.aws.amazon.com/forum.jspa?forumID=285&start=0