使用 AzureAD Auth 在 Kubernetes 中的 .NET Core API 上出现 RemoteCertificateNameMismatch 错误

RemoteCertificateNameMismatch error on .NET Core API in Kubernetes with AzureAD Auth

我正在尝试创建一个在 Kubernetes(裸机,使用 NGINX-Ingress)中运行的网络 API(ASP.NET 核心使用 Azure AD OAuth 进行授权)。 运行 IIS Express 中的 API 没有错误,但在将其转换为 Docker 图像并将其部署到集群中后,应用程序在任何请求时随机抛出以下异常:

fail: Microsoft.AspNetCore.Server.Kestrel[13]
      Connection id "0HMBENKCJR3ER", Request id "0HMBENKCJR3ER:00000003": An unhandled exception was thrown by the application.
      System.InvalidOperationException: IDX20803: Unable to obtain configuration from: 'System.String'.
       ---> System.IO.IOException: IDX20804: Unable to retrieve document from: 'System.String'.
       ---> System.Net.Http.HttpRequestException: The SSL connection could not be established, see inner exception.
       ---> System.Security.Authentication.AuthenticationException: The remote certificate is invalid according to the validation procedure: RemoteCertificateNameMismatch
         at System.Net.Security.SslStream.SendAuthResetSignal(ProtocolToken message, ExceptionDispatchInfo exception)
         at System.Net.Security.SslStream.ForceAuthenticationAsync[TIOAdapter](TIOAdapter adapter, Boolean receiveFirst, Byte[] reAuthenticationData, Boolean isApm)
         at System.Net.Http.ConnectHelper.EstablishSslConnectionAsyncCore(Boolean async, Stream stream, SslClientAuthenticationOptions sslOptions, CancellationToken cancellationToken)
         --- End of inner exception stack trace ---
         at System.Net.Http.ConnectHelper.EstablishSslConnectionAsyncCore(Boolean async, Stream stream, SslClientAuthenticationOptions sslOptions, CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
         at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at System.Net.Http.DiagnosticsHandler.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at System.Net.Http.HttpClient.SendAsyncCore(HttpRequestMessage request, HttpCompletionOption completionOption, Boolean async, Boolean emitTelemetryStartStop, CancellationToken cancellationToken)
         at Microsoft.IdentityModel.Protocols.HttpDocumentRetriever.GetDocumentAsync(String address, CancellationToken cancel)
         --- End of inner exception stack trace ---
         at Microsoft.IdentityModel.Protocols.HttpDocumentRetriever.GetDocumentAsync(String address, CancellationToken cancel)
         at Microsoft.Identity.Web.InstanceDiscovery.IssuerConfigurationRetriever.GetConfigurationAsync(String address, IDocumentRetriever retriever, CancellationToken cancel)
         at Microsoft.IdentityModel.Protocols.ConfigurationManager`1.GetConfigurationAsync(CancellationToken cancel)
         --- End of inner exception stack trace ---
         at Microsoft.IdentityModel.Protocols.ConfigurationManager`1.GetConfigurationAsync(CancellationToken cancel)
         at Microsoft.IdentityModel.Protocols.ConfigurationManager`1.GetConfigurationAsync()
         at Microsoft.Identity.Web.Resource.AadIssuerValidator.GetIssuerValidator(String aadAuthority)
         at Microsoft.Identity.Web.MicrosoftIdentityWebApiAuthenticationBuilderExtensions.<>c__DisplayClass3_0.<AddMicrosoftIdentityWebApiImplementation>b__0(JwtBearerOptions options, IServiceProvider serviceProvider, IOptionsMonitor`1 microsoftIdentityOptionsMonitor)
         at Microsoft.Extensions.Options.ConfigureNamedOptions`3.Configure(String name, TOptions options)
         at Microsoft.Extensions.Options.OptionsFactory`1.Create(String name)
         at Microsoft.Extensions.Options.OptionsMonitor`1.<>c__DisplayClass11_0.<Get>b__0()
         at System.Lazy`1.ViaFactory(LazyThreadSafetyMode mode)
      --- End of stack trace from previous location ---
         at System.Lazy`1.CreateValue()
         at System.Lazy`1.get_Value()
         at Microsoft.Extensions.Options.OptionsCache`1.GetOrAdd(String name, Func`1 createOptions)
         at Microsoft.Extensions.Options.OptionsMonitor`1.Get(String name)
         at Microsoft.AspNetCore.Authentication.AuthenticationHandler`1.InitializeAsync(AuthenticationScheme scheme, HttpContext context)
         at Microsoft.AspNetCore.Authentication.AuthenticationHandlerProvider.GetHandlerAsync(HttpContext context, String authenticationScheme)
         at Microsoft.AspNetCore.Authentication.AuthenticationService.AuthenticateAsync(HttpContext context, String scheme)
         at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application)

有时 pod 可以正常工作,有时它会在每次请求时持续失败并出现此错误,但每次部署时这种情况似乎都是随机变化的。 集群上的 NGINX-Ingress 完全配置了自己的证书和中间证书,并且可以在没有授权的情况下通过 HTTPS 提供类似的API服务而不会出错。

这是图像的 Docker 文件:

FROM mcr.microsoft.com/dotnet/aspnet:5.0-buster-slim AS base
RUN apt-get update \
    && apt-get install -y --no-install-recommends libgdiplus libc6-dev \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*
WORKDIR /app
EXPOSE 80
EXPOSE 443

FROM mcr.microsoft.com/dotnet/sdk:5.0-buster-slim AS build
WORKDIR /src
COPY ["AuthTest/AuthTest.csproj", "AuthTest/"]
RUN dotnet restore "AuthTest/AuthTest.csproj"
COPY . .
WORKDIR "/src/AuthTest"
RUN dotnet build "AuthTest.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "AuthTest.csproj" -c Release -o /app/publish

FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .

ENTRYPOINT ["dotnet", "AuthTest.dll"]

这是用于部署和入口的 .yaml 文件:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: authtest-dep
  labels:
    app: authtest
spec:
  selector:
    matchLabels:
      app: authtest-app
  replicas: 4
  template:
    metadata:
      labels:
        app: authtest-app
    spec:
      containers:
        - name: authtest-app
          image: authtest:latest
          imagePullPolicy: Never

---

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: authtest-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/x-forwarded-prefix: /api/auth
spec:
  tls:
  - hosts:
      - valid.hostname.com
    secretName: secret-tls
  rules:
  - host: valid.hostname.com
    http:
      paths:
        - path: /api/auth/(.*)
          pathType: Prefix
          backend:
            service:
              name: authtest-service
              port:
                number: 80

我尝试在 Docker 图像中包含我们自己的证书(非自签名),还尝试覆盖证书验证以查看哪些证书失败,但没有成功。 我在 Whosebug 上找不到任何答案,因为它们中的大多数似乎都围绕使用自签名证书展开,或者有涉及禁用证书身份验证的解决方案,这似乎适得其反。 我的问题是抛出错误的证书是什么以及如何修复它?

经过大量搜索和诊断,我找到了解决方案。

在我的例子中,DNS 运行异常。当 API pod 尝试连接到 login.microsoftonline.com 时,它首先尝试解析集群内的 DNS,结果如下:

[INFO] 10.244.1.60:53217 - 25833 "AAAA IN login.microsoftonline.com.default.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd 162 0.000272099s
[INFO] 10.244.1.60:53217 - 30678 "A IN login.microsoftonline.com.default.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd 162 0.000654896s
[INFO] 10.244.1.59:42740 - 22396 "AAAA IN login.microsoftonline.com.svc.cluster.local. udp 61 false 512" NXDOMAIN qr,aa,rd 154 0.000201999s
[INFO] 10.244.1.59:42740 - 25712 "A IN login.microsoftonline.com.svc.cluster.local. udp 61 false 512" NXDOMAIN qr,aa,rd 154 0.000690095s
[INFO] 10.244.1.59:44797 - 49225 "A IN login.microsoftonline.com.cluster.local. udp 57 false 512" NXDOMAIN qr,aa,rd 150 0.000318898s
[INFO] 10.244.1.59:44797 - 60243 "AAAA IN login.microsoftonline.com.cluster.local. udp 57 false 512" NXDOMAIN qr,aa,rd 150 0.000847195s
[INFO] 10.244.1.59:53903 - 63962 "AAAA IN login.microsoftonline.com.mydomain.com. udp 57 false 512" NXDOMAIN qr,aa,rd,ra 152 0.001664889s
[INFO] 10.244.1.59:53903 - 58575 "A IN login.microsoftonline.com.mydomain.com. udp 57 false 512" NOERROR qr,aa,rd,ra 112 0.001311591s

DNS 错误地为我提供了 login.microsoftonline.com.mydomain.com 的 NOERROR 结果,导致连接到一个地址与我的证书。在 pod 中使用 curl 显示:

$ curl -v login.microsoftonline.com

* Server certificate:
*  subject: CN=*.mydomain.com
*  start date: Mar 11 00:00:00 2021 GMT
*  expire date: Apr 11 23:59:59 2022 GMT
*  subjectAltName does not match login.microsoftonline.com
* SSL: no alternative certificate subject name matches target host name 'login.microsoftonline.com'

这导致了 RemoteCertificateNameMismatch 错误。

我找到了两种解决方法:

  1. 通过在 URL 末尾添加一个点来使用完全限定的域名(例如:google.com,而不是 google.com)。这会绕过 DNS 解析并使其直接连接到指定地址。遗憾的是,这对 login.microsoftonline.com 不起作用,所以我使用了选项 2。
  2. 调整 pod 的 DNS 配置 ndots,在 spec:
  3. 下添加以下 dnsConfig
spec:
  containers:
    - name: authtest
      image: authtest:latest
      imagePullPolicy: Never
  dnsConfig:
    options:
      - name: ndots
        value: "2"

默认情况下,ndots 设置为 5。这意味着任何少于五个点的 URL 都不会被视为绝对域,DNS 将尝试首先使用本地搜索域解析它,然后再最终将其作为绝对地址尝试。

指定ndots为2,login.microsoftonline.com自动成为绝对域,不会出现内部解析错误

这可以被认为是解决 DNS 解析不正确问题的创可贴,但就我而言,它解决了问题。