kubernetes 上的火花:预期的 HTTP 101 响应但为“403 Forbidden”

spark on kubernetes: Expected HTTP 101 response but was '403 Forbidden'

我正在使用 spark 2.4.4。使用以下配置和系统环境在我的 ubuntu 机器上 运行 spark-submit 遇到问题:

cpchung:small$ minikube start --kubernetes-version=1.17.0
  minikube v1.6.2 on Ubuntu 19.10
✨  Selecting 'kvm2' driver from user configuration (alternates: [virtualbox none])
  Tip: Use 'minikube start -p <name>' to create a new cluster, or 'minikube delete' to delete this one.
  Using the running kvm2 "minikube" VM ...
⌛  Waiting for the host to be provisioned ...
  Preparing Kubernetes v1.17.0 on Docker '18.09.9' ...
  Downloading kubelet v1.17.0
  Downloading kubeadm v1.17.0
  Pulling images ...
  Launching Kubernetes ... 
  Done! kubectl is now configured to use "minikube"
cpchung:small$ minikube docker-env
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://192.168.39.246:2376"
export DOCKER_CERT_PATH="/home/cpchung/.minikube/certs"
# Run this command to configure your shell:
# eval $(minikube docker-env)
cpchung:small$ eval $(minikube docker-env)
cpchung:small$ ./run.sh 

我的 run.sh 里面是这样的:

spark-submit --class SimpleApp \
--master "k8s://https://192.168.39.246:8443" \
--deploy-mode cluster \
--conf spark.kubernetes.container.image=somename:latest \
--conf spark.kubernetes.image.pullPolicy=IfNotPresent \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
local:///small.jar

然后我得到了这个:

simpleapp-1580791262398-driver   0/1     Error    0          21s
cpchung:small$ kubectl logs simpleapp-1580791262398-driver

20/02/04 04:41:10 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/02/04 04:41:10 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://simpleapp-1580791262398-driver-svc.default.svc:4040
20/02/04 04:41:10 INFO SparkContext: Added JAR file:///small.jar at spark://simpleapp-1580791262398-driver-svc.default.svc:7078/jars/small.jar with timestamp 1580791270603
20/02/04 04:41:14 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes.
20/02/04 04:41:14 WARN WatchConnectionManager: Exec Failure: HTTP 403, Status: 403 - 
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'

Spark 最多 v2.4.4 使用 fabric8 Kubernetes client v4.1.2 which works only with Kubernetes APIs 1.9.0 - 1.12.0, please refer compatibility matrix and pom.xml

Spark 2.4.5 会将 Kubernetes 客户端升级到 v4.6.1,并将支持 Kubernetes API 到 1.15.2

所以你有以下选择:

  • 将 Kubernetes 集群降级到 1.12
  • 等到 Spark 2.4.5/3.0.0 发布并将 Kubernetes 集群降级到 1.15.2
  • 升级 Spark fabric8 Kubernetes client 依赖项并自定义构建 Spark 及其 Docker 图像

希望对您有所帮助。

更新 1

为了 运行 Kubernetes 集群上的 API 版本 1.17.0+ Spark,您需要根据标记 v2.4.5 对 Spark 代码进行补丁,并进行以下更改:

diff --git a/resource-managers/kubernetes/core/pom.xml b/resource-managers/kubernetes/core/pom.xml
index c9850ca512..41468e1363 100644
--- a/resource-managers/kubernetes/core/pom.xml
+++ b/resource-managers/kubernetes/core/pom.xml
@@ -29,7 +29,7 @@
   <name>Spark Project Kubernetes</name>
   <properties>
     <sbt.project.name>kubernetes</sbt.project.name>
-    <kubernetes.client.version>4.6.1</kubernetes.client.version>
+    <kubernetes.client.version>4.7.1</kubernetes.client.version>
   </properties>
 
   <dependencies>
diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala
index 575bc54ffe..c180ea3a7b 100644
--- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala
+++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala
@@ -67,7 +67,8 @@ private[spark] class BasicDriverFeatureStep(
       .withAmount(driverCpuCores)
       .build()
     val driverMemoryQuantity = new QuantityBuilder(false)
-      .withAmount(s"${driverMemoryWithOverheadMiB}Mi")
+      .withAmount(driverMemoryWithOverheadMiB.toString)
+      .withFormat("Mi")
       .build()
     val maybeCpuLimitQuantity = driverLimitCores.map { limitCores =>
       ("cpu", new QuantityBuilder(false).withAmount(limitCores).build())
diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
index d89995ba5e..9c589ace92 100644
--- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
+++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
@@ -86,7 +86,8 @@ private[spark] class BasicExecutorFeatureStep(
     // executorId
     val hostname = name.substring(Math.max(0, name.length - 63))
     val executorMemoryQuantity = new QuantityBuilder(false)
-      .withAmount(s"${executorMemoryTotal}Mi")
+      .withAmount(executorMemoryTotal.toString)
+      .withFormat("Mi")
       .build()
     val executorCpuQuantity = new QuantityBuilder(false)
       .withAmount(executorCoresRequest)

QuantityBuilder 更改了使用 this PR 解析输入的方式,自 fabric8 Kubernetes 客户端 v4.7.0.

起可用

更新 2

Spark 图像基于 openjdk:8-jdk-slim,其中 运行s Java 8u252 按顺序具有 bug related to OkHttp. To fix it we require fabric8 Kubernetes client v4.9.2, please refer its release notes 以获取更多详细信息。

上面的补丁也可以简化:

diff --git a/resource-managers/kubernetes/core/pom.xml b/resource-managers/kubernetes/core/pom.xml
index c9850ca512..595aaba8dd 100644
--- a/resource-managers/kubernetes/core/pom.xml
+++ b/resource-managers/kubernetes/core/pom.xml
@@ -29,7 +29,7 @@
   <name>Spark Project Kubernetes</name>
   <properties>
     <sbt.project.name>kubernetes</sbt.project.name>
-    <kubernetes.client.version>4.6.1</kubernetes.client.version>
+    <kubernetes.client.version>4.9.2</kubernetes.client.version>
   </properties>
 
   <dependencies>
diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala
index 575bc54ffe..a1559e07a4 100644
--- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala
+++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala
@@ -66,9 +66,7 @@ private[spark] class BasicDriverFeatureStep(
     val driverCpuQuantity = new QuantityBuilder(false)
       .withAmount(driverCpuCores)
       .build()
-    val driverMemoryQuantity = new QuantityBuilder(false)
-      .withAmount(s"${driverMemoryWithOverheadMiB}Mi")
-      .build()
+    val driverMemoryQuantity = new Quantity(s"${driverMemoryWithOverheadMiB}Mi")
     val maybeCpuLimitQuantity = driverLimitCores.map { limitCores =>
       ("cpu", new QuantityBuilder(false).withAmount(limitCores).build())
     }
diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
index d89995ba5e..b439ebf837 100644
--- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
+++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
@@ -85,9 +85,7 @@ private[spark] class BasicExecutorFeatureStep(
     // name as the hostname.  This preserves uniqueness since the end of name contains
     // executorId
     val hostname = name.substring(Math.max(0, name.length - 63))
-    val executorMemoryQuantity = new QuantityBuilder(false)
-      .withAmount(s"${executorMemoryTotal}Mi")
-      .build()
+    val executorMemoryQuantity = new Quantity(s"${executorMemoryTotal}Mi")
     val executorCpuQuantity = new QuantityBuilder(false)
       .withAmount(executorCoresRequest)
       .build()