阶跃函数状态机中并行状态的部分重试

Question

在具有多个任务分支的并行状态中，如果其中一些（可能是 1 或 2 个分支但不是全部）失败了它们自己的所有重试尝试，您如何以编程方式重试状态机执行而不触发其他分支的重试，鉴于您不能手动重试失败的任务分支？有针对这种情况的设计模式吗？

工作流程如下：Start -> Parallel (Branch A, B, C, D, E) -> Report -> End

假设 B 和 D 失败但其他成功。

我想像

一样部分重试工作流程

Start -> Parallel (Branch B, D) -> Report -> End

Answer 1

这是一个解决方案：这很棘手，因为我们必须进行自己的错误处理和重试限制。

在每个分支的头部使用 as Pass 状态，将分支标识符添加到分支有效负载中。 A Lambda Task 评估一个分支的任务是否需要做。必须处理第一次（原始有效载荷）和后续迭代（用失败分支数组装饰的有效载荷，例如 wantsRetryBranch: [branch identifiers] 将在#3 中添加）。 Choice 根据 Lambda 的输出确定分支是否应该继续。 Choice 对于新的执行和出现分支错误的重试应该评估为真。
所有分支都必须 return 成功。^* 如果分支中发生错误，我们必须捕获它并且 return 一些错误指示return 有效负载，例如{status: "fail", error: <error-detail>}
在 Parallel 状态之后，Lambda Task 评估输出数组的错误。 “重试？” Choice 状态如下。如果分支有错误并且重试计数器未达到其限制，则循环回到#1。否则，执行以 Success 或 Fail 结束，具体取决于是否存在错误。

国家大纲

Parallel
  Identify [Pass] # a pass task that labels the branch with an ID  (A, B, C, D)
  CheckShouldDo [Lambda] # a lambda task return true if branch should run - if wantsRetryBranch array is missing or has branch ID
  DoBranch? [Choice] # evaluate previous step's outupt proceed if true, skip to empty pass task if false
    Skip [Pass] # if shouldDo === false, bypass the tasks
    Tasks # do the actual work; catch errors, write output to tasks key - task.taskName.status should be `success` or `fail`

CheckShouldRetry [Lambda] # evaluate array coming from parallel - look for task errors and whether retry limit is reached
# (continue) returns a payload map consistent with the initial event or full history; make sure to increment retry limits
Retry? [Choice] # evaluate the map from the previous step - if payload-like, go loop back to to Parallel; or end with success or fail

Success # all branches passed
Fail # if after retries, wantsRetryBranch array is not empty

一图抵千字

* 任何 Parallel 分支失败 stops all branches。尽管有错误，我们无法看到哪些分支成功了。因此，我们必须自己跟踪分支 success-failure，而没有内置 Step Functions 的 error-handling.

阶跃函数状态机中并行状态的部分重试

Partial retry of parallel states in a step function state machine

aws-step-functions

国家大纲

一图抵千字