GKE Cluster autoscaler profile for older lustre

GKE Cluster autoscaler profile for older luster

现在在 GKE 中创建新的 K8s 集群时有新的选项卡

Automation - 为自动维护、自动缩放和自动配置设置集群级标准。编辑节点池以实现自动缩放、自动升级和修复等自动化。

它有两个选项 - 平衡（默认） & 优化利用率（测试版）

我们不能为旧集群设置这个吗？

我们是运行旧 GKE 版本 1.14 我们希望在 70% 的资源利用率时自动扩展集群现有节点。

目前，我们有 2 个不同的池 - 只有一个启用了自动节点配置，但在高峰时段，如果 HPA 扩展 POD，新节点需要一些时间才能加入集群，有时退出节点会因资源压力而开始崩溃。

您可以通过以下方式设置自动缩放配置文件：

GCP Cloud Console (Web UI) -> Kubernetes Engine -> CLUSTER-NAME -> Edit -> Autoscaling profile

This screenshot was made on GKE version 1.14.10-gke.50

你也可以运行:

gcloud beta container clusters update CLUSTER-NAME --autoscaling-profile optimize-utilization

官方文档指出：

You can specify which autoscaling profile to use when making such decisions. The currently available profiles are:

balanced: The default profile.

optimize-utilization: Prioritize optimizing utilization over keeping spare resources in the cluster. When selected, the cluster autoscaler scales down the cluster more aggressively: it can remove more nodes, and remove nodes faster. This profile has been optimized for use with batch workloads that are not sensitive to start-up latency. We do not currently recommend using this profile with serving workloads.

-- Cloud.google.com: Kubernetes Engine: Cluster autoscaler: Autoscaling profiles

此设置 (optimize-utilization) 在用于处理工作负载时可能不是最佳选择。它将更积极地尝试 scale-down（删除一个节点）。它会自动减少您的集群拥有的可用资源量，并且可能更容易受到工作负载高峰的影响。

回答部分问题：

we are running old GKE version 1.14 we want to auto-scale cluster when 70% of resource utilization of existing nodes.

如文档中所述：

Cluster autoscaler increases or decreases the size of the node pool automatically, based on the resource requests (rather than actual resource utilization) of Pods running on that node pool's nodes. It periodically checks the status of Pods and nodes, and takes action:

If Pods are unschedulable because there are not enough nodes in the node pool, cluster autoscaler adds nodes, up to the maximum size of the node pool.

-- Cloud.google.com: Kubernetes Engine: Cluster autoscaler: How cluster autoscaler works

您不能直接根据资源利用率百分比 (70%) 扩展集群。 Autoscaler 基于集群无法在当前存在的节点上调度 pods。

您可以通过 CPU 使用 Horizontal Pod Autoscaler 来扩展 Deployment 的副本数量。这个 Pods 可以有一个缓冲区来处理增加的流量，并且在特定阈值之后它们可以产生新的 Pods 其中 CA（集群自动缩放器）将发送一个新节点的请求（如果新 Pods 不可安排）。此缓冲区将是防止应用程序无法管理的突然峰值的机制。

缓冲区部分和预留空间的详细解释在：

Cloud.google.com: Solutions: Best practices for running cost effective kubernetes applications on gke: Autoscaler and over-provisioning

在 GKE:

上有大量关于运行具有成本效益的应用程序的文档

Cloud.google.com: Solutions: Best practices for running cost effective kubernetes applications on gke

我鼓励您查看上方 link，因为其中有很多关于（扩展、过度配置、工作负载峰值、HPA、VPA 等方面的提示和见解。 )

其他资源：

Cloud.google.com: Kubernetes Engine: Node auto provisioning

GKE Cluster autoscaler profile for older lustre

GKE Cluster autoscaler profile for older luster

google-cloud-platform

kubernetes

google-kubernetes-engine

kubernetes-pod