ServiceFabric:服务在部署期间不存在

ServiceFabric: Service does not exist during deployment

我有一个使用服务结构的现有系统。一切都很好,除了在服务发布期间服务不可用并且任何解决方案 return 错误。

这是预料之中的,但是如果在此期间调用只是等待或超时就更好了。在此期间,我的错误日志有时会填满 20 万行相同的错误。

我想要像下面这样的代码,但是它会去哪里呢?

public async Task Execute(Func<Task> action)
{
    try
    {
        action()
            .ConfigureAwait(false);
    }
    catch (FabricServiceNotFoundException ex)
    {
        await Task.Delay(TimeSpan.FromSeconds(??))
            .ConfigureAwait(false);

        action()
            .ConfigureAwait(false);
    }

}

错误:

System.Fabric.FabricServiceNotFoundException: Service does not exist. ---> System.Runtime.InteropServices.COMException: Exception from HRESULT: 0x80071BCD
   at System.Fabric.Interop.NativeClient.IFabricServiceManagementClient6.EndResolveServicePartition(IFabricAsyncOperationContext context)
   at System.Fabric.FabricClient.ServiceManagementClient.ResolveServicePartitionEndWrapper(IFabricAsyncOperationContext context)
   at System.Fabric.Interop.AsyncCallOutAdapter2`1.Finish(IFabricAsyncOperationContext context, Boolean expectedCompletedSynchronously)
   --- End of inner exception stack trace ---
   at Microsoft.ServiceFabric.Services.Client.ServicePartitionResolver.ResolveHelperAsync(Func`5 resolveFunc, ResolvedServicePartition previousRsp, TimeSpan resolveTimeout, TimeSpan maxRetryInterval, CancellationToken cancellationToken, Uri serviceUri)
   at Microsoft.ServiceFabric.Services.Communication.Client.CommunicationClientFactoryBase`1.CreateClientWithRetriesAsync(ResolvedServicePartition previousRsp, TargetReplicaSelector targetReplicaSelector, String listenerName, OperationRetrySettings retrySettings, Boolean doInitialResolve, CancellationToken cancellationToken)
   at Microsoft.ServiceFabric.Services.Communication.Client.CommunicationClientFactoryBase`1.GetClientAsync(ResolvedServicePartition previousRsp, TargetReplicaSelector targetReplica, String listenerName, OperationRetrySettings retrySettings, CancellationToken cancellationToken)
   at Microsoft.ServiceFabric.Services.Remoting.V2.FabricTransport.Client.FabricTransportServiceRemotingClientFactory.GetClientAsync(ResolvedServicePartition previousRsp, TargetReplicaSelector targetReplicaSelector, String listenerName, OperationRetrySettings retrySettings, CancellationToken cancellationToken)
   at Microsoft.ServiceFabric.Services.Communication.Client.ServicePartitionClient`1.GetCommunicationClientAsync(CancellationToken cancellationToken)
   at Microsoft.ServiceFabric.Services.Communication.Client.ServicePartitionClient`1.InvokeWithRetryAsync[TResult](Func`2 func, CancellationToken cancellationToken, Type[] doNotRetryExceptionTypes)
   at Microsoft.ServiceFabric.Services.Remoting.V2.Client.ServiceRemotingPartitionClient.InvokeAsync(IServiceRemotingRequestMessage remotingRequestMessage, String methodName, CancellationToken cancellationToken)
   at Microsoft.ServiceFabric.Services.Remoting.Builder.ProxyBase.InvokeAsyncV2(Int32 interfaceId, Int32 methodId, String methodName, IServiceRemotingRequestMessageBody requestMsgBodyValue, CancellationToken cancellationToken)
   at Microsoft.ServiceFabric.Services.Remoting.Builder.ProxyBase.ContinueWithResultV2[TRetval](Int32 interfaceId, Int32 methodId, Task`1 task)

正如预期的那样,Service Fabric 必须关闭服务才能启动新版本,这将导致像您遇到的那样的暂时性错误。

默认情况下,Remoting API 已经内置了重试逻辑,来自 docs:

The service proxy handles all failover exceptions for the service partition it is created for. It re-resolves the endpoints if there are failover exceptions (non-transient exceptions) and retries the call with the correct endpoint. The number of retries for failover exceptions is indefinite. If transient exceptions occur, the proxy retries the call.

话虽如此,您不应该要求添加额外的重试逻辑,也许您应该尝试调整 OperationRetrySettings 以更好地处理这些重试。

如果问题没有解决,而你仍然想在代码中添加逻辑,最简单的处理方法是使用像 Polly 这样的瞬态故障处理库,如下所示:

   var policy = Policy
                 .Handle<FabricServiceNotFoundException>()
                 .WaitAndRetry(new[]
                 {
                   TimeSpan.FromSeconds(1),
                   TimeSpan.FromSeconds(2),
                   TimeSpan.FromSeconds(3)
                 });

   policy.Execute(() => DoSomething());

在此示例中,您在重试之间执行指数退避,如果调用次数过多,我建议改用断路器方法。