传达 Infinispan 远程异常会产生过多的网络流量
Communicating Infinispan Remote Exceptions Generates Excessive Network Traffic
当我们的Infinispan集群(版本9.4.8.Final)出现异常时,出现异常的节点会将此信息发送给集群中的其他节点。这似乎是设计使然。
这 activity 会导致如此多的流量,从而导致超时异常,进而使节点想要将其超时异常传达给其他节点。在生产环境中,我们的 3 节点 Infinispan 集群完全 饱和了 20 Gb/s link.
例如,在 2 节点 QA 集群中,我们观察到以下情况:
节点 1:
ISPN000476: Timed out waiting for responses for request 7861 from node2
节点 2:
ISPN000217: Received exception from node1, see cause for remote stack trace
进一步向下打印在节点 2 上的堆栈跟踪,我们看到:
Timed out waiting for responses for request 7861 from node2
有很多这样的。我们在此期间捕获了一个数据包,可以看到有 50 KB 的数据包包含远程错误列表及其整个 java 堆栈跟踪。
发生这种情况时,它是某种 "perfect storm"。每次超时都会产生一个通过网络发送的错误。这会增加拥塞和超时。从那里开始,事情变得非常迅速。
我知道我需要解决超时问题 - 寻找 GC 收集暂停等,并可能考虑增加超时。但是,我想知道在这些事件发生时是否有办法阻止这种行为的发生。仔细想想,节点 1 与节点 2 的对话超时然后通过网络向节点 2 发送错误副本告诉它 'I timed out talking to you'.
似乎很奇怪
有什么方法可以避免传输这些远程堆栈跟踪?非常感谢任何见解或建议。
编辑
堆栈跟踪示例:
2019-12-06 11:37:01,587 ERROR [org.keycloak.services.error.KeycloakErrorHandler] (default task-26) Uncaught server error: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from ********, see cause for remote stack trace
at org.infinispan.remoting.transport.ResponseCollectors.wrapRemoteException(ResponseCollectors.java:28)
at org.infinispan.remoting.transport.ValidSingleResponseCollector.withException(ValidSingleResponseCollector.java:37)
at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:21)
at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
at org.infinispan.remoting.transport.impl.SingleTargetRequest.onResponse(SingleTargetRequest.java:35)
at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:52)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1372)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1275)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access0(JGroupsTransport.java:126)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1420)
at org.jgroups.JChannel.up(JChannel.java:816)
at org.jgroups.fork.ForkProtocolStack.up(ForkProtocolStack.java:133)
at org.jgroups.stack.Protocol.up(Protocol.java:340)
at org.jgroups.protocols.FORK.up(FORK.java:141)
at org.jgroups.protocols.FRAG3.up(FRAG3.java:171)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:339)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:339)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:872)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240)
at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1008)
at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:734)
at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:389)
at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:590)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:131)
at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:203)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:253)
at org.jgroups.protocols.MERGE3.up(MERGE3.java:280)
at org.jgroups.protocols.Discovery.up(Discovery.java:295)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1249)
at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.jboss.as.clustering.jgroups.ClassLoaderThreadFactory.lambda$newThread[=10=](ClassLoaderThreadFactory.java:52)
at java.lang.Thread.run(Thread.java:745)
Suppressed: org.infinispan.util.logging.TraceException
at org.infinispan.interceptors.impl.SimpleAsyncInvocationStage.get(SimpleAsyncInvocationStage.java:41)
at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:250)
at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1918)
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1433)
at org.infinispan.cache.impl.DecoratedCache.put(DecoratedCache.java:685)
at org.infinispan.cache.impl.DecoratedCache.put(DecoratedCache.java:240)
at org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:116)
at org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:116)
at org.infinispan.cache.impl.EncoderCache.put(EncoderCache.java:195)
at org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:116)
at org.keycloak.cluster.infinispan.InfinispanNotificationsManager.notify(InfinispanNotificationsManager.java:155)
at org.keycloak.cluster.infinispan.InfinispanClusterProvider.notify(InfinispanClusterProvider.java:130)
at org.keycloak.models.cache.infinispan.CacheManager.sendInvalidationEvents(CacheManager.java:206)
at org.keycloak.models.cache.infinispan.UserCacheSession.runInvalidations(UserCacheSession.java:140)
at org.keycloak.models.cache.infinispan.UserCacheSession.commit(UserCacheSession.java:152)
at org.keycloak.services.DefaultKeycloakTransactionManager.commit(DefaultKeycloakTransactionManager.java:146)
at org.keycloak.services.resources.admin.UsersResource.createUser(UsersResource.java:125)
at sun.reflect.GeneratedMethodAccessor487.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:139)
at org.jboss.resteasy.core.ResourceMethodInvoker.internalInvokeOnTarget(ResourceMethodInvoker.java:510)
at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTargetAfterFilter(ResourceMethodInvoker.java:400)
at org.jboss.resteasy.core.ResourceMethodInvoker.lambda$invokeOnTarget[=10=](ResourceMethodInvoker.java:364)
at org.jboss.resteasy.core.interception.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:355)
at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:366)
at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:338)
at org.jboss.resteasy.core.ResourceLocatorInvoker.invokeOnTargetObject(ResourceLocatorInvoker.java:137)
at org.jboss.resteasy.core.ResourceLocatorInvoker.invoke(ResourceLocatorInvoker.java:106)
at org.jboss.resteasy.core.ResourceLocatorInvoker.invokeOnTargetObject(ResourceLocatorInvoker.java:132)
at org.jboss.resteasy.core.ResourceLocatorInvoker.invoke(ResourceLocatorInvoker.java:106)
at org.jboss.resteasy.core.ResourceLocatorInvoker.invokeOnTargetObject(ResourceLocatorInvoker.java:132)
at org.jboss.resteasy.core.ResourceLocatorInvoker.invoke(ResourceLocatorInvoker.java:100)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:439)
at org.jboss.resteasy.core.SynchronousDispatcher.lambda$invoke(SynchronousDispatcher.java:229)
at org.jboss.resteasy.core.SynchronousDispatcher.lambda$preprocess[=10=](SynchronousDispatcher.java:135)
at org.jboss.resteasy.core.interception.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:355)
at org.jboss.resteasy.core.SynchronousDispatcher.preprocess(SynchronousDispatcher.java:138)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:215)
at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:227)
at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:56)
at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:51)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:791)
at io.undertow.servlet.handlers.ServletHandler.handleRequest(ServletHandler.java:74)
at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:129)
at org.keycloak.services.filters.KeycloakSessionServletFilter.doFilter(KeycloakSessionServletFilter.java:90)
at io.undertow.servlet.core.ManagedFilter.doFilter(ManagedFilter.java:61)
at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:131)
at io.undertow.servlet.handlers.FilterHandler.handleRequest(FilterHandler.java:84)
at io.undertow.servlet.handlers.security.ServletSecurityRoleHandler.handleRequest(ServletSecurityRoleHandler.java:62)
at io.undertow.servlet.handlers.ServletChain.handleRequest(ServletChain.java:68)
at io.undertow.servlet.handlers.ServletDispatchingHandler.handleRequest(ServletDispatchingHandler.java:36)
at org.wildfly.extension.undertow.security.SecurityContextAssociationHandler.handleRequest(SecurityContextAssociationHandler.java:78)
at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
at io.undertow.servlet.handlers.security.SSLInformationAssociationHandler.handleRequest(SSLInformationAssociationHandler.java:132)
at io.undertow.servlet.handlers.security.ServletAuthenticationCallHandler.handleRequest(ServletAuthenticationCallHandler.java:57)
at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
at io.undertow.security.handlers.AbstractConfidentialityHandler.handleRequest(AbstractConfidentialityHandler.java:46)
at io.undertow.servlet.handlers.security.ServletConfidentialityConstraintHandler.handleRequest(ServletConfidentialityConstraintHandler.java:64)
at io.undertow.security.handlers.AuthenticationMechanismsHandler.handleRequest(AuthenticationMechanismsHandler.java:60)
at io.undertow.servlet.handlers.security.CachedAuthenticatedSessionHandler.handleRequest(CachedAuthenticatedSessionHandler.java:77)
at io.undertow.security.handlers.NotificationReceiverHandler.handleRequest(NotificationReceiverHandler.java:50)
at io.undertow.security.handlers.AbstractSecurityContextAssociationHandler.handleRequest(AbstractSecurityContextAssociationHandler.java:43)
at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
at org.wildfly.extension.undertow.security.jacc.JACCContextIdHandler.handleRequest(JACCContextIdHandler.java:61)
at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
at org.wildfly.extension.undertow.deployment.GlobalRequestControllerHandler.handleRequest(GlobalRequestControllerHandler.java:68)
at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
at io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(ServletInitialHandler.java:292)
at io.undertow.servlet.handlers.ServletInitialHandler.access0(ServletInitialHandler.java:81)
at io.undertow.servlet.handlers.ServletInitialHandler.call(ServletInitialHandler.java:138)
at io.undertow.servlet.handlers.ServletInitialHandler.call(ServletInitialHandler.java:135)
at io.undertow.servlet.core.ServletRequestContextThreadSetupAction.call(ServletRequestContextThreadSetupAction.java:48)
at io.undertow.servlet.core.ContextClassLoaderSetupAction.call(ContextClassLoaderSetupAction.java:43)
at org.wildfly.extension.undertow.security.SecurityContextThreadSetupAction.lambda$create[=10=](SecurityContextThreadSetupAction.java:105)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create[=10=](UndertowDeploymentInfoService.java:1502)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create[=10=](UndertowDeploymentInfoService.java:1502)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create[=10=](UndertowDeploymentInfoService.java:1502)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create[=10=](UndertowDeploymentInfoService.java:1502)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create[=10=](UndertowDeploymentInfoService.java:1502)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create[=10=](UndertowDeploymentInfoService.java:1502)
at io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(ServletInitialHandler.java:272)
at io.undertow.servlet.handlers.ServletInitialHandler.access[=10=]0(ServletInitialHandler.java:81)
at io.undertow.servlet.handlers.ServletInitialHandler.handleRequest(ServletInitialHandler.java:104)
at io.undertow.server.Connectors.executeRootHandler(Connectors.java:364)
at io.undertow.server.HttpServerExchange.run(HttpServerExchange.java:830)
at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1363)
... 1 more
Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 7865 from ********
at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
不,无法在 Infinispan 9 中禁用堆栈跟踪序列化。4.x
Infinispan 10.0.0.Final 不在异常响应中包含堆栈跟踪,但它只是一些其他工作的副作用,我已经打开 ISPN-11022 以添加回远程堆栈痕迹。
请在问题中添加评论并包含完整的堆栈跟踪示例。
我们能够解决导致网络流量激增的问题。详情如下。
tl;博士:
我们从 JGroups UDP 堆栈切换到 TCP 堆栈,ISPN 文档提到的 TCP 堆栈对于使用像我们这样的分布式缓存的小型集群来说效率更高。
重现问题
为了重现该问题,我们执行了以下操作:
配置 Keycloak 后台作业,每 60 秒将 ISPN 缓存条目清除到 运行,这样我们就不必等待作业在其第 15 分钟达到 运行间隔(独立-ha.xml):
<scheduled-task-interval>60</scheduled-task-interval>
生成大量用户会话(我们使用了 jmeter)。在我们的测试中,我们最终生成了大约 100,000 个会话。
- 将 Keycloak 中的 SSO Idle 和 Max time TTL 配置为极短,以便所有会话都将过期(60 秒)
- 继续用jmeter给系统加负载(这部分很重要)
当清除缓存的作业 运行 时,我们会看到网络充满流量 (120 MB/s)。发生这种情况时,我们会在集群中的每个节点上看到大量以下错误:
ISPN000476: Timed out waiting for responses for request 7861 from
node2
ISPN000217: Received exception from node1, see cause for remote stack
trace
专业提示:配置文件存储的钝化以持久保存您的 ISPN 数据。关闭集群并将“.dat”文件保存在别处。使用这些文件在测试之间立即恢复 ISPN 集群的状态。
解决问题
使用上述技术,我们能够根据需要重现问题。因此,我们着手使用下述方法解决它。
更改 JGroups 堆栈以使用 TCP
我们将 JGroups 堆栈从 UDP 更改为 TCP,还配置了 TCPPing 以进行发现。我们在阅读以下指南中对 TCP 堆栈的描述后执行此操作:
具体来说:
"Uses TCP for transport and UDP multicast for discovery. Suitable for
smaller clusters (under 100 nodes) only if you are using distributed
caches because TCP is more efficient than UDP as a point-to-point
protocol."
这一改动彻底解决了我们的问题
我们的 Wildfly 16 单机配置-ha.xml 如下:
<subsystem xmlns="urn:jboss:domain:jgroups:6.0">
<channels default="ee">
<channel name="ee" stack="tcp" cluster="ejb"/>
</channels>
<stacks>
<stack name="udp">
<transport type="UDP" socket-binding="jgroups-udp"/>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
<stack name="tcp">
<transport type="TCP" socket-binding="jgroups-tcp"/>
<socket-protocol type="TCPPING" socket-binding="jgroups-tcp">
<property name="initial_hosts">HOST-X[7600],HOST-Y[7600],HOST-Z[7600]</property>
<property name="port_range">1</property>
</socket-protocol>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
调整 JVM 垃圾收集器参数
我们遵循了 ISPN 调整指南中的一些建议:
https://infinispan.org/docs/stable/titles/tuning/tuning.html
特别是,我们从 JDK 8 的默认 GC 更改为使用 CMS 收集器。具体来说,我们的 Wildfly 服务器现在提供了以下 JVM 参数:
-Xms6144m -Xmx6144m -Xmn1536M -XX:MetaspaceSize=192M -XX:MaxMetaspaceSize=512m -Djava.net.preferIPv4Stack=true -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC
其他变化
我们应用了其他特定于我们环境的更改:
- 在 iptables 级别阻止 IP 多播 + UDP(因为我们想确保我们仅使用 TCP)
- 在网络级别配置了带宽上限,以防止 ISPN 群集使网络饱和并影响使用相同 link 的其他主机。
当我们的Infinispan集群(版本9.4.8.Final)出现异常时,出现异常的节点会将此信息发送给集群中的其他节点。这似乎是设计使然。
这 activity 会导致如此多的流量,从而导致超时异常,进而使节点想要将其超时异常传达给其他节点。在生产环境中,我们的 3 节点 Infinispan 集群完全 饱和了 20 Gb/s link.
例如,在 2 节点 QA 集群中,我们观察到以下情况:
节点 1:
ISPN000476: Timed out waiting for responses for request 7861 from node2
节点 2:
ISPN000217: Received exception from node1, see cause for remote stack trace
进一步向下打印在节点 2 上的堆栈跟踪,我们看到:
Timed out waiting for responses for request 7861 from node2
有很多这样的。我们在此期间捕获了一个数据包,可以看到有 50 KB 的数据包包含远程错误列表及其整个 java 堆栈跟踪。
发生这种情况时,它是某种 "perfect storm"。每次超时都会产生一个通过网络发送的错误。这会增加拥塞和超时。从那里开始,事情变得非常迅速。
我知道我需要解决超时问题 - 寻找 GC 收集暂停等,并可能考虑增加超时。但是,我想知道在这些事件发生时是否有办法阻止这种行为的发生。仔细想想,节点 1 与节点 2 的对话超时然后通过网络向节点 2 发送错误副本告诉它 'I timed out talking to you'.
似乎很奇怪有什么方法可以避免传输这些远程堆栈跟踪?非常感谢任何见解或建议。
编辑
堆栈跟踪示例:
2019-12-06 11:37:01,587 ERROR [org.keycloak.services.error.KeycloakErrorHandler] (default task-26) Uncaught server error: org.infinispan.remoting.RemoteException: ISPN000217: Received exception from ********, see cause for remote stack trace
at org.infinispan.remoting.transport.ResponseCollectors.wrapRemoteException(ResponseCollectors.java:28)
at org.infinispan.remoting.transport.ValidSingleResponseCollector.withException(ValidSingleResponseCollector.java:37)
at org.infinispan.remoting.transport.ValidSingleResponseCollector.addResponse(ValidSingleResponseCollector.java:21)
at org.infinispan.remoting.transport.impl.SingleTargetRequest.receiveResponse(SingleTargetRequest.java:52)
at org.infinispan.remoting.transport.impl.SingleTargetRequest.onResponse(SingleTargetRequest.java:35)
at org.infinispan.remoting.transport.impl.RequestRepository.addResponse(RequestRepository.java:52)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processResponse(JGroupsTransport.java:1372)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1275)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport.access0(JGroupsTransport.java:126)
at org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1420)
at org.jgroups.JChannel.up(JChannel.java:816)
at org.jgroups.fork.ForkProtocolStack.up(ForkProtocolStack.java:133)
at org.jgroups.stack.Protocol.up(Protocol.java:340)
at org.jgroups.protocols.FORK.up(FORK.java:141)
at org.jgroups.protocols.FRAG3.up(FRAG3.java:171)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:339)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:339)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:872)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:240)
at org.jgroups.protocols.UNICAST3.deliverMessage(UNICAST3.java:1008)
at org.jgroups.protocols.UNICAST3.handleDataReceived(UNICAST3.java:734)
at org.jgroups.protocols.UNICAST3.up(UNICAST3.java:389)
at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:590)
at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:131)
at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:203)
at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:253)
at org.jgroups.protocols.MERGE3.up(MERGE3.java:280)
at org.jgroups.protocols.Discovery.up(Discovery.java:295)
at org.jgroups.protocols.TP.passMessageUp(TP.java:1249)
at org.jgroups.util.SubmitToThreadPool$SingleMessageHandler.run(SubmitToThreadPool.java:87)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.jboss.as.clustering.jgroups.ClassLoaderThreadFactory.lambda$newThread[=10=](ClassLoaderThreadFactory.java:52)
at java.lang.Thread.run(Thread.java:745)
Suppressed: org.infinispan.util.logging.TraceException
at org.infinispan.interceptors.impl.SimpleAsyncInvocationStage.get(SimpleAsyncInvocationStage.java:41)
at org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invoke(AsyncInterceptorChainImpl.java:250)
at org.infinispan.cache.impl.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:1918)
at org.infinispan.cache.impl.CacheImpl.put(CacheImpl.java:1433)
at org.infinispan.cache.impl.DecoratedCache.put(DecoratedCache.java:685)
at org.infinispan.cache.impl.DecoratedCache.put(DecoratedCache.java:240)
at org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:116)
at org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:116)
at org.infinispan.cache.impl.EncoderCache.put(EncoderCache.java:195)
at org.infinispan.cache.impl.AbstractDelegatingCache.put(AbstractDelegatingCache.java:116)
at org.keycloak.cluster.infinispan.InfinispanNotificationsManager.notify(InfinispanNotificationsManager.java:155)
at org.keycloak.cluster.infinispan.InfinispanClusterProvider.notify(InfinispanClusterProvider.java:130)
at org.keycloak.models.cache.infinispan.CacheManager.sendInvalidationEvents(CacheManager.java:206)
at org.keycloak.models.cache.infinispan.UserCacheSession.runInvalidations(UserCacheSession.java:140)
at org.keycloak.models.cache.infinispan.UserCacheSession.commit(UserCacheSession.java:152)
at org.keycloak.services.DefaultKeycloakTransactionManager.commit(DefaultKeycloakTransactionManager.java:146)
at org.keycloak.services.resources.admin.UsersResource.createUser(UsersResource.java:125)
at sun.reflect.GeneratedMethodAccessor487.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:139)
at org.jboss.resteasy.core.ResourceMethodInvoker.internalInvokeOnTarget(ResourceMethodInvoker.java:510)
at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTargetAfterFilter(ResourceMethodInvoker.java:400)
at org.jboss.resteasy.core.ResourceMethodInvoker.lambda$invokeOnTarget[=10=](ResourceMethodInvoker.java:364)
at org.jboss.resteasy.core.interception.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:355)
at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:366)
at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:338)
at org.jboss.resteasy.core.ResourceLocatorInvoker.invokeOnTargetObject(ResourceLocatorInvoker.java:137)
at org.jboss.resteasy.core.ResourceLocatorInvoker.invoke(ResourceLocatorInvoker.java:106)
at org.jboss.resteasy.core.ResourceLocatorInvoker.invokeOnTargetObject(ResourceLocatorInvoker.java:132)
at org.jboss.resteasy.core.ResourceLocatorInvoker.invoke(ResourceLocatorInvoker.java:106)
at org.jboss.resteasy.core.ResourceLocatorInvoker.invokeOnTargetObject(ResourceLocatorInvoker.java:132)
at org.jboss.resteasy.core.ResourceLocatorInvoker.invoke(ResourceLocatorInvoker.java:100)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:439)
at org.jboss.resteasy.core.SynchronousDispatcher.lambda$invoke(SynchronousDispatcher.java:229)
at org.jboss.resteasy.core.SynchronousDispatcher.lambda$preprocess[=10=](SynchronousDispatcher.java:135)
at org.jboss.resteasy.core.interception.PreMatchContainerRequestContext.filter(PreMatchContainerRequestContext.java:355)
at org.jboss.resteasy.core.SynchronousDispatcher.preprocess(SynchronousDispatcher.java:138)
at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:215)
at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:227)
at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:56)
at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:51)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:791)
at io.undertow.servlet.handlers.ServletHandler.handleRequest(ServletHandler.java:74)
at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:129)
at org.keycloak.services.filters.KeycloakSessionServletFilter.doFilter(KeycloakSessionServletFilter.java:90)
at io.undertow.servlet.core.ManagedFilter.doFilter(ManagedFilter.java:61)
at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:131)
at io.undertow.servlet.handlers.FilterHandler.handleRequest(FilterHandler.java:84)
at io.undertow.servlet.handlers.security.ServletSecurityRoleHandler.handleRequest(ServletSecurityRoleHandler.java:62)
at io.undertow.servlet.handlers.ServletChain.handleRequest(ServletChain.java:68)
at io.undertow.servlet.handlers.ServletDispatchingHandler.handleRequest(ServletDispatchingHandler.java:36)
at org.wildfly.extension.undertow.security.SecurityContextAssociationHandler.handleRequest(SecurityContextAssociationHandler.java:78)
at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
at io.undertow.servlet.handlers.security.SSLInformationAssociationHandler.handleRequest(SSLInformationAssociationHandler.java:132)
at io.undertow.servlet.handlers.security.ServletAuthenticationCallHandler.handleRequest(ServletAuthenticationCallHandler.java:57)
at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
at io.undertow.security.handlers.AbstractConfidentialityHandler.handleRequest(AbstractConfidentialityHandler.java:46)
at io.undertow.servlet.handlers.security.ServletConfidentialityConstraintHandler.handleRequest(ServletConfidentialityConstraintHandler.java:64)
at io.undertow.security.handlers.AuthenticationMechanismsHandler.handleRequest(AuthenticationMechanismsHandler.java:60)
at io.undertow.servlet.handlers.security.CachedAuthenticatedSessionHandler.handleRequest(CachedAuthenticatedSessionHandler.java:77)
at io.undertow.security.handlers.NotificationReceiverHandler.handleRequest(NotificationReceiverHandler.java:50)
at io.undertow.security.handlers.AbstractSecurityContextAssociationHandler.handleRequest(AbstractSecurityContextAssociationHandler.java:43)
at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
at org.wildfly.extension.undertow.security.jacc.JACCContextIdHandler.handleRequest(JACCContextIdHandler.java:61)
at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
at org.wildfly.extension.undertow.deployment.GlobalRequestControllerHandler.handleRequest(GlobalRequestControllerHandler.java:68)
at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
at io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(ServletInitialHandler.java:292)
at io.undertow.servlet.handlers.ServletInitialHandler.access0(ServletInitialHandler.java:81)
at io.undertow.servlet.handlers.ServletInitialHandler.call(ServletInitialHandler.java:138)
at io.undertow.servlet.handlers.ServletInitialHandler.call(ServletInitialHandler.java:135)
at io.undertow.servlet.core.ServletRequestContextThreadSetupAction.call(ServletRequestContextThreadSetupAction.java:48)
at io.undertow.servlet.core.ContextClassLoaderSetupAction.call(ContextClassLoaderSetupAction.java:43)
at org.wildfly.extension.undertow.security.SecurityContextThreadSetupAction.lambda$create[=10=](SecurityContextThreadSetupAction.java:105)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create[=10=](UndertowDeploymentInfoService.java:1502)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create[=10=](UndertowDeploymentInfoService.java:1502)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create[=10=](UndertowDeploymentInfoService.java:1502)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create[=10=](UndertowDeploymentInfoService.java:1502)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create[=10=](UndertowDeploymentInfoService.java:1502)
at org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create[=10=](UndertowDeploymentInfoService.java:1502)
at io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(ServletInitialHandler.java:272)
at io.undertow.servlet.handlers.ServletInitialHandler.access[=10=]0(ServletInitialHandler.java:81)
at io.undertow.servlet.handlers.ServletInitialHandler.handleRequest(ServletInitialHandler.java:104)
at io.undertow.server.Connectors.executeRootHandler(Connectors.java:364)
at io.undertow.server.HttpServerExchange.run(HttpServerExchange.java:830)
at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1982)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1363)
... 1 more
Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 7865 from ********
at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
不,无法在 Infinispan 9 中禁用堆栈跟踪序列化。4.x
Infinispan 10.0.0.Final 不在异常响应中包含堆栈跟踪,但它只是一些其他工作的副作用,我已经打开 ISPN-11022 以添加回远程堆栈痕迹。
请在问题中添加评论并包含完整的堆栈跟踪示例。
我们能够解决导致网络流量激增的问题。详情如下。
tl;博士:
我们从 JGroups UDP 堆栈切换到 TCP 堆栈,ISPN 文档提到的 TCP 堆栈对于使用像我们这样的分布式缓存的小型集群来说效率更高。
重现问题
为了重现该问题,我们执行了以下操作:
配置 Keycloak 后台作业,每 60 秒将 ISPN 缓存条目清除到 运行,这样我们就不必等待作业在其第 15 分钟达到 运行间隔(独立-ha.xml):
<scheduled-task-interval>60</scheduled-task-interval>
生成大量用户会话(我们使用了 jmeter)。在我们的测试中,我们最终生成了大约 100,000 个会话。
- 将 Keycloak 中的 SSO Idle 和 Max time TTL 配置为极短,以便所有会话都将过期(60 秒)
- 继续用jmeter给系统加负载(这部分很重要)
当清除缓存的作业 运行 时,我们会看到网络充满流量 (120 MB/s)。发生这种情况时,我们会在集群中的每个节点上看到大量以下错误:
ISPN000476: Timed out waiting for responses for request 7861 from node2
ISPN000217: Received exception from node1, see cause for remote stack trace
专业提示:配置文件存储的钝化以持久保存您的 ISPN 数据。关闭集群并将“.dat”文件保存在别处。使用这些文件在测试之间立即恢复 ISPN 集群的状态。
解决问题
使用上述技术,我们能够根据需要重现问题。因此,我们着手使用下述方法解决它。
更改 JGroups 堆栈以使用 TCP
我们将 JGroups 堆栈从 UDP 更改为 TCP,还配置了 TCPPing 以进行发现。我们在阅读以下指南中对 TCP 堆栈的描述后执行此操作:
具体来说:
"Uses TCP for transport and UDP multicast for discovery. Suitable for smaller clusters (under 100 nodes) only if you are using distributed caches because TCP is more efficient than UDP as a point-to-point protocol."
这一改动彻底解决了我们的问题
我们的 Wildfly 16 单机配置-ha.xml 如下:
<subsystem xmlns="urn:jboss:domain:jgroups:6.0">
<channels default="ee">
<channel name="ee" stack="tcp" cluster="ejb"/>
</channels>
<stacks>
<stack name="udp">
<transport type="UDP" socket-binding="jgroups-udp"/>
<protocol type="PING"/>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="UFC"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
<stack name="tcp">
<transport type="TCP" socket-binding="jgroups-tcp"/>
<socket-protocol type="TCPPING" socket-binding="jgroups-tcp">
<property name="initial_hosts">HOST-X[7600],HOST-Y[7600],HOST-Z[7600]</property>
<property name="port_range">1</property>
</socket-protocol>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2"/>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS"/>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
</subsystem>
调整 JVM 垃圾收集器参数
我们遵循了 ISPN 调整指南中的一些建议:
https://infinispan.org/docs/stable/titles/tuning/tuning.html
特别是,我们从 JDK 8 的默认 GC 更改为使用 CMS 收集器。具体来说,我们的 Wildfly 服务器现在提供了以下 JVM 参数:
-Xms6144m -Xmx6144m -Xmn1536M -XX:MetaspaceSize=192M -XX:MaxMetaspaceSize=512m -Djava.net.preferIPv4Stack=true -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+DisableExplicitGC
其他变化
我们应用了其他特定于我们环境的更改:
- 在 iptables 级别阻止 IP 多播 + UDP(因为我们想确保我们仅使用 TCP)
- 在网络级别配置了带宽上限,以防止 ISPN 群集使网络饱和并影响使用相同 link 的其他主机。