Google 云运行并发限制 + 自动缩放说明

Google Cloud Run concurrency limits + autoscaling clarifications

Google Cloud 运行允许每个容器指定 request concurrency limit。输入字段的潜台词是“当达到这个并发数时，启动一个新的容器实例”两个澄清问题：

是否有任何方法可以设置 Cloud 运行预期达到并发限制，并在此之前生成一个新容器以确保请求超过容器 1 的并发限制由容器 2 无缝处理而不影响请求的冷启动时间？
假设我们将 最大实例数 设置为 10、并发数设置为 10，目前有 100 个请求正在处理（即我们已经达到最大容量并且不能再自动缩放）。第 101 个请求会发生什么？是要排队一段时间，还是马上退回一个5XX？

Is there any way to set Cloud Run to anticipate the concurrency limit being reached, and spawn a new container a little before that happens to ensure that requests over the concurrency limit of Container 1 are seamlessly handled by Container 2 without the cold start time affecting the requests?

没有。 Cloud 运行不会尝试预测未来的流量模式。

Imagine we have Maximum Instances set to 10, Concurrency set to 10 and there are currently 100 requests being processed (i.e. we've maxed our our capacity and cannot autoscale any more). What happens to the 101th request? Will it be queued up for some period of time, or will a 5XX be returned immediately?

将返回 HTTP 错误 429 Too Many Requests。

[编辑 - Google 关于请求排队的云文档]

Under normal circumstances, your revision scales out by creating new instances to handle incoming traffic load. But when you set a maximum instances limit, in some scenarios there will be insufficient instances to meet that traffic load. In that case, incoming requests queue for up to 60 seconds. During this 60 second window, if an instance finishes processing requests, it becomes available to process queued requests. If no instances become available during the 60 second window, the request fails with a 429 error code on Cloud Run (fully managed).

About maximum container instances

Google 云运行并发限制 + 自动缩放说明

Google Cloud Run concurrency limits + autoscaling clarifications

google-cloud-platform

google-cloud-run

Google 云 运行 并发限制 + 自动缩放说明

Google Cloud Run concurrency limits + autoscaling clarifications

google-cloud-platform

google-cloud-run

Google 云运行并发限制 + 自动缩放说明