在代码中配置和部署 Lambda 管道

Question

我想知道是否有任何 AWS 服务或项目允许我们在代码中使用 AWS Lambdas 配置数据管道。我正在寻找类似下面的东西。假设有一个名为 pipeline

的库

from pipeline import connect, s3, lambda, deploy
p = connect(s3('input-bucket/prefix'),
            lambda(myPythonFunc, dependencies=[list_of_dependencies])
            s3('output-bucket/prefix'))
deploy(p)

当然，这个想法可以有很多变体。此用例假设只有一个 s3 存储桶，例如可能有一个输入 s3 存储桶列表。

这可以通过 AWS Data Pipeline 完成吗？我（快速）阅读的文档说 Lambda 用于触发管道。

Answer 1

我认为最接近可用的是新发布的 Lambda Step Functions. With these you can coordinate multiple steps that transform your data. I don't believe that they support standard event sources, so you would have to create a standard lambda function (potentially using the Serverless Application Model) 中的状态机功能，用于从 S3 读取并触发您的状态机。

在代码中配置和部署 Lambda 管道

Configure and Deploy Lambda Pipeline in code

amazon-s3

amazon-web-services

amazon-data-pipeline

aws-lambda