当一个节点不可访问时 RabbitMQ 集群失败

Question

我通过 Docker 和 Docker Cloud 创建了一个 RabbitMQ 集群。我是运行两个独立节点上的两个 RabbitMQ 容器（均托管在 AWS 上）。

rabbitmqctl cluster_status的输出是：

Cluster status of node 'rabbit@rabbitmq-cluster-2' ...
[{nodes,[{disc,['rabbit@rabbitmq-cluster-1','rabbit@rabbitmq-cluster-2']}]},
 {running_nodes,['rabbit@rabbitmq-cluster-1','rabbit@rabbitmq-cluster-2']},
 {cluster_name,<<"rabbit@rabbitmq-cluster-1">>},
 {partitions,[]}]

但是，当我停止一个 container/node 时，我的消息无法发送并在 .dlx

中排队

我正在将 senecajs 与 NodeJS 结合使用。

有没有人遇到同样的问题，可以给我指明方向？

Answer 1

回答我自己的问题：

The problem was that Docker, after starting, caches the DNS and is not able to connect to a new one. So if one cluster fails, Docker still tries to connect to the one, instead of trying a new one.

解决方案是在连接到 RabbitMQ 时编写我自己的函数。我首先检查 net.createConnection 主机是否在线。如果是，我会连接到它，如果不是，我会尝试另一个。

每次RabbitMQ节点宕机，我的服务失败，重启并调用"try this host"函数。

当一个节点不可访问时 RabbitMQ 集群失败

RabbitMQ cluster fails when one node is not reachable

rabbitmq

node.js

docker

seneca