日志中的 Artemis 主从故障转移和同步错误消息

Artemis Master-Slave failover and sync error message in logs

在使用 Artemis 2.11 和更旧的 Java 版本后,我决定将我的整个系统更新为当前可用的“最新最棒的软件”。所以我在两个 Ubuntu 18.04 VM 上使用 Artemis 2.14 和 Java 14.0.2,具有 4 个内核和 16 GB RAM。

我这样配置主从 ha-policy:

大师:

<ha-policy>
    <replication>
       <master>
          <check-for-live-server>true</check-for-live-server>
       </master>
    </replication>
</ha-policy>

奴隶

<ha-policy>
    <replication>
        <slave>
           <allow-failback>true</allow-failback>
        </slave>
    </replication>
</ha-policy>

我正在使用这样的集群连接:

硕士

<cluster-connections>
  <cluster-connection name="artemis-cluster">
    <connector-ref>Artemis-Node-A-Sync</connector-ref>
    <retry-interval>500</retry-interval>
    <use-duplicate-detection>true</use-duplicate-detection>
    <message-load-balancing>ON_DEMAND</message-load-balancing>
    <max-hops>1</max-hops>
    <static-connectors>
      <connector-ref>Node-B-Sync</connector-ref>
    </static-connectors>
  </cluster-connection>
</cluster-connections>

奴隶

<cluster-connections>
  <cluster-connection name="artemis-cluster">
    <connector-ref>Node-B-Sync</connector-ref>
    <retry-interval>500</retry-interval>
    <use-duplicate-detection>true</use-duplicate-detection>
    <message-load-balancing>ON_DEMAND</message-load-balancing>
    <max-hops>1</max-hops>
    <static-connectors>
      <connector-ref>Node-A-Sync</connector-ref>
    </static-connectors>
  </cluster-connection>
</cluster-connections>

我的问题是我从 SLAVE...

收到这条 ERROR 消息
2020-08-07 12:45:37,548 INFO  [io.hawt.system.ConfigManager] Configuration will be discovered via system properties
2020-08-07 12:45:37,550 INFO  [io.hawt.jmx.JmxTreeWatcher] Welcome to hawtio 1.5.12 : http://hawt.io/ : Don't cha wish your console was hawt like me? ;-)
2020-08-07 12:45:37,552 INFO  [io.hawt.jmx.UploadManager] Using file upload directory: /opt/mybroker-broker/tmp/uploads
2020-08-07 12:45:37,565 INFO  [io.hawt.web.AuthenticationFilter] Starting hawtio authentication filter, JAAS realm: "activemq" authorized role(s): "amq" role principal classes: "org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal"
2020-08-07 12:45:37,585 INFO  [io.hawt.web.JolokiaConfiguredAgentServlet] Jolokia overridden property: [key=policyLocation, value=file:/opt/mybroker-broker/etc/jolokia-access.xml]
2020-08-07 12:45:37,605 INFO  [io.hawt.web.RBACMBeanInvoker] Using MBean [hawtio:type=security,area=jmx,rank=0,name=HawtioDummyJMXSecurity] for role based access control
2020-08-07 12:45:37,703 INFO  [io.hawt.system.ProxyWhitelist] Initial proxy whitelist: [localhost, 127.0.0.1, *.*.*.*, *.*.*.*, localhost.localdomain]
2020-08-07 12:45:37,966 INFO  [org.apache.activemq.artemis] AMQ241001: HTTP Server started at http://0.0.0.0:8161
2020-08-07 12:45:37,967 INFO  [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at http://0.0.0.0:8161/console/jolokia
2020-08-07 12:45:37,967 INFO  [org.apache.activemq.artemis] AMQ241004: Artemis Console available at http://0.0.0.0:8161/console
2020-08-07 12:45:46,905 WARN  [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure to 192.168.144.75/192.168.144.75:22522 has been detected: AMQ219015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
2020-08-07 12:45:50,678 INFO  [org.apache.activemq.artemis.core.server] AMQ221066: Initiating quorum vote: LiveFailoverQuorumVote
2020-08-07 12:45:50,678 INFO  [org.apache.activemq.artemis.core.server] AMQ221084: Requested 0 quorum votes
2020-08-07 12:45:50,679 INFO  [org.apache.activemq.artemis.core.server] AMQ221083: ignoring quorum vote as max cluster size is 1.
2020-08-07 12:45:50,679 INFO  [org.apache.activemq.artemis.core.server] AMQ221071: Failing over based on quorum vote results.
2020-08-07 12:45:50,720 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ229026: Backup Server was not yet in sync with live]
        at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:310) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:3946) [artemis-server-2.14.0.jar:2.14.0]

master 的日志文件是:

2020-08-07 12:44:58,292 INFO  [org.apache.activemq.artemis.core.server] AMQ221014: 21% loaded
2020-08-07 12:44:58,540 INFO  [org.apache.activemq.artemis.core.server] AMQ221014: 42% loaded
2020-08-07 12:44:59,020 INFO  [org.apache.activemq.artemis.core.server] AMQ221014: 64% loaded
2020-08-07 12:44:59,416 INFO  [org.apache.activemq.artemis.core.server] AMQ221014: 85% loaded
2020-08-07 12:45:12,143 INFO  [org.apache.activemq.artemis.core.server] AMQ221080: Deploying address DLQ supporting [ANYCAST]
2020-08-07 12:45:12,145 INFO  [org.apache.activemq.artemis.core.server] AMQ221003: Deploying ANYCAST queue DLQ on address DLQ
2020-08-07 12:45:12,151 INFO  [org.apache.activemq.artemis.core.server] AMQ221080: Deploying address ExpiryQueue supporting [ANYCAST]
2020-08-07 12:45:12,152 INFO  [org.apache.activemq.artemis.core.server] AMQ221003: Deploying ANYCAST queue ExpiryQueue on address ExpiryQueue
2020-08-07 12:45:12,382 INFO  [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:1883 for protocols [MQTT]
2020-08-07 12:45:12,385 INFO  [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:22522 for protocols [CORE,MQTT,AMQP,STOMP,HORNETQ,OPENWIRE]
2020-08-07 12:45:12,387 INFO  [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:8883 for protocols [MQTT]
2020-08-07 12:45:12,388 INFO  [org.apache.activemq.artemis.core.server] AMQ221020: Started EPOLL Acceptor at 0.0.0.0:18884 for protocols [MQTT]
2020-08-07 12:45:12,388 INFO  [org.apache.activemq.artemis.core.server] AMQ221007: Server is now live
2020-08-07 12:45:12,389 INFO  [org.apache.activemq.artemis.core.server] AMQ221001: Apache ActiveMQ Artemis Message Broker version 2.14.0 [0.0.0.0, nodeID=95f808d9-d641-11ea-9c48-005056073c33]
2020-08-07 12:45:12,650 INFO  [org.apache.activemq.hawtio.branding.PluginContextListener] Initialized activemq-branding plugin
2020-08-07 12:45:12,686 INFO  [org.apache.activemq.hawtio.plugin.PluginContextListener] Initialized artemis-plugin plugin
2020-08-07 12:45:13,281 INFO  [io.hawt.HawtioContextListener] Initialising hawtio services
2020-08-07 12:45:13,304 INFO  [io.hawt.system.ConfigManager] Configuration will be discovered via system properties
2020-08-07 12:45:13,306 INFO  [io.hawt.jmx.JmxTreeWatcher] Welcome to hawtio 1.5.12 : http://hawt.io/ : Don't cha wish your console was hawt like me? ;-)
2020-08-07 12:45:13,311 INFO  [io.hawt.jmx.UploadManager] Using file upload directory: /opt/mybroker-broker/tmp/uploads
2020-08-07 12:45:13,325 INFO  [io.hawt.web.AuthenticationFilter] Starting hawtio authentication filter, JAAS realm: "activemq" authorized role(s): "amq" role principal classes: "org.apache.activemq.artemis.spi.core.security.jaas.RolePrincipal"
2020-08-07 12:45:13,344 INFO  [io.hawt.web.JolokiaConfiguredAgentServlet] Jolokia overridden property: [key=policyLocation, value=file:/opt/mybroker-broker/etc/jolokia-access.xml]
2020-08-07 12:45:13,373 INFO  [io.hawt.web.RBACMBeanInvoker] Using MBean [hawtio:type=security,area=jmx,rank=0,name=HawtioDummyJMXSecurity] for role based access control
2020-08-07 12:45:13,472 INFO  [io.hawt.system.ProxyWhitelist] Initial proxy whitelist: [localhost, 127.0.0.1, *.*.*.*, 192.168.144.75, localhost.localdomain]
2020-08-07 12:45:13,680 INFO  [org.apache.activemq.artemis] AMQ241001: HTTP Server started at http://0.0.0.0:8161
2020-08-07 12:45:13,681 INFO  [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at http://0.0.0.0:8161/console/jolokia
2020-08-07 12:45:13,681 INFO  [org.apache.activemq.artemis] AMQ241004: Artemis Console available at http://0.0.0.0:8161/console
2020-08-07 12:45:46,608 WARN  [org.apache.activemq.artemis.core.server] AMQ222010: Critical IO Error, shutting down the server. file=NIOSequentialFile /opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page, message=/opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page (Too many open files): ActiveMQIOErrorException[errorType=IO_ERROR message=/opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page (Too many open files)]
        at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.open(NIOSequentialFile.java:156) [artemis-journal-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.open(NIOSequentialFile.java:98) [artemis-journal-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.paging.impl.Page.open(Page.java:483) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.openNewPage(PagingStoreImpl.java:1136) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.forceAnotherPage(PagingStoreImpl.java:606) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.getPageInformationForSync(JournalStorageManager.java:738) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:665) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation.run(SharedNothingLiveActivation.java:179) [artemis-server-2.14.0.jar:2.14.0]
        at java.base/java.lang.Thread.run(Thread.java:832) [java.base:]
Caused by: java.io.FileNotFoundException: /opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page (Too many open files)
        at java.base/java.io.RandomAccessFile.open0(Native Method) [java.base:]
        at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:347) [java.base:]
        at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:261) [java.base:]
        at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:216) [java.base:]
        at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.open(NIOSequentialFile.java:143) [artemis-journal-2.14.0.jar:2.14.0]
        ... 8 more

2020-08-07 12:45:46,615 WARN  [org.apache.activemq.artemis.core.server] AMQ222251: Unable to start replication: java.io.FileNotFoundException: /opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page (Too many open files)
        at java.base/java.io.RandomAccessFile.open0(Native Method) [java.base:]
        at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:347) [java.base:]
        at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:261) [java.base:]
        at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:216) [java.base:]
        at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.open(NIOSequentialFile.java:143) [artemis-journal-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.open(NIOSequentialFile.java:98) [artemis-journal-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.paging.impl.Page.open(Page.java:483) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.openNewPage(PagingStoreImpl.java:1136) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.forceAnotherPage(PagingStoreImpl.java:606) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.getPageInformationForSync(JournalStorageManager.java:738) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:665) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation.run(SharedNothingLiveActivation.java:179) [artemis-server-2.14.0.jar:2.14.0]
        at java.base/java.lang.Thread.run(Thread.java:832) [java.base:]

2020-08-07 12:45:46,693 WARN  [org.apache.activemq.artemis.utils.actors.OrderedExecutor] Server locator is closed (maybe it was garbage collected): java.lang.IllegalStateException: Server locator is closed (maybe it was garbage collected)
        at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.assertOpen(ServerLocatorImpl.java:1844) [artemis-core-client-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:644) [artemis-core-client-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:545) [artemis-core-client-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:524) [artemis-core-client-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.cluster.ClusterController$ConnectRunnable.run(ClusterController.java:433) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) [artemis-commons-2.14.0.jar:2.14.0]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [java.base:]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [java.base:]
        at org.apache.activemq.artemis.utils.ActiveMQThreadFactory.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.14.0.jar:2.14.0]

2020-08-07 12:45:47,407 INFO  [io.hawt.HawtioContextListener] Destroying hawtio services
2020-08-07 12:45:47,409 INFO  [io.hawt.web.AuthenticationFilter] Destroying hawtio authentication filter 2020-08-07 12:45:47,442 INFO  [org.apache.activemq.hawtio.plugin.PluginContextListener] Destroyed artemis-plugin plugin
2020-08-07 12:45:47,444 INFO  [org.apache.activemq.hawtio.branding.PluginContextListener] Destroyed activemq-branding plugin
2020-08-07 12:45:47,476 INFO  [org.apache.activemq.artemis.core.server] AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.14.0 [95f808d9-d641-11ea-9c48-005056073c33] stopped, uptime 1 minute
tail: /opt/mybroker-broker/log/artemis.log: file truncated
2020-08-07 12:46:07,212 INFO  [org.apache.activemq.artemis.integration.bootstrap] AMQ101000: Starting ActiveMQ Artemis Server
2020-08-07 12:46:07,242 INFO  [org.apache.activemq.artemis.core.server] AMQ221000: live Message Broker is starting with configuration Broker Configuration (clustered=true,journalDirectory=data/journal,bindingsDirectory=data/bindings,largeMessagesDirectory=data/large-messages,pagingDirectory=data/paging)
2020-08-07 12:46:09,516 INFO  [org.apache.activemq.artemis.core.server] AMQ221012: Using AIO Journal
2020-08-07 12:46:09,542 INFO  [org.apache.activemq.artemis.core.server] AMQ221057: Global Max Size is being adjusted to 1/2 of the JVM max size (-Xmx). being defined as 5,368,709,120
2020-08-07 12:46:09,589 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-server]. Adding protocol support for: CORE
2020-08-07 12:46:09,590 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-amqp-protocol]. Adding protocol support for: AMQP
2020-08-07 12:46:09,591 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-hornetq-protocol]. Adding protocol support for: HORNETQ
2020-08-07 12:46:09,592 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-mqtt-protocol]. Adding protocol support for: MQTT
2020-08-07 12:46:09,592 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-openwire-protocol]. Adding protocol support for: OPENWIRE
2020-08-07 12:46:09,593 INFO  [org.apache.activemq.artemis.core.server] AMQ221043: Protocol module found: [artemis-stomp-protocol]. Adding protocol support for: STOMP
2020-08-07 12:46:09,776 WARN  [org.apache.activemq.artemis.core.server] AMQ222035: Directory /opt/mybroker-broker/data/paging/9fd77264-d6db-11ea-bb8e-005056073c33 did not have an identification file address.txt
2020-08-07 12:46:13,695 WARN  [org.apache.activemq.artemis.core.server] AMQ222035: Directory /opt/mybroker-broker/data/paging/9fd77264-d6db-11ea-bb8e-005056073c33 did not have an identification file address.txt

...当我执行以下操作时:

  1. 停止从机
  2. 重启主机
  3. 启动从机

使用 Jolokia,我看到 SLAVE 在一段时间后连接到 MASTER。

问题:

  1. ERROR 消息是什么意思?
  2. 是否有 CLI 命令来获取有关同步的一些信息?
  3. 是否有 CLI 命令来获取一些 system/cluster 状态信息?

主代理的日志中记录了基本问题:

2020-08-07 12:45:46,615 WARN  [org.apache.activemq.artemis.core.server] AMQ222251: Unable to start replication: java.io.FileNotFoundException: /opt/mybroker-broker/data/paging/9fbeba62-d6db-11ea-bb8e-005056073c33/000000003.page (Too many open files)

这是一个环境错误,表示操作系统将不允许进程(即代理)打开更多文件。

由于代理无法打开它需要的所有文件,因此它无法将其数据与备份正确同步。此外,无法打开其他文件被解释为“严重”错误,迫使代理关闭自身。当主代理关闭从属激活时,但由于主代理无法完成与从属的初始复制,从属状态:

2020-08-07 12:45:50,720 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ229026: Backup Server was not yet in sync with live]

从服务器只是简单地记录一个事实,即在主服务器关闭之前它没有完成与主服务器的初始 synchronization/replication 进程。

最终您需要增加用户 运行 经纪人的文件限制。快速网络搜索应该可以帮助您了解针对您的特定操作系统的该任务的细节。