像 RabbitMQ 这样的队列服务如何在集群上运行？

How does a queueing service like RabbitMQ operate on a cluster?

我了解排队服务如何独立工作，但我不明白它如何在集群上工作？

如果一个队列发生故障，是否将相同的消息发送到每个队列以减轻损失？还是它们对每个队列进行负载平衡？

此外，如果它们在每个队列上实现负载平衡，是否就不需要像 Celery 这样的服务了？

谢谢

来自rabbitmq官方文档：

1 - 像 RabbitMQ 这样的队列服务如何在集群上运行？

聚类

All data/state required for the operation of a RabbitMQ broker is replicated across all nodes. An exception to this are message queues, which by default reside on one node, though they are visible and reachable from all nodes. To replicate queues across nodes in a cluster, see the documentation on high availability (note: this guide is a prerequisite for mirroring).

Nodes are Equal Peers Some distributed systems have leader and follower nodes. This is generally not true for RabbitMQ. All nodes in a RabbitMQ cluster are equal peers: there are no special nodes in RabbitMQ core. This topic becomes more nuanced when queue mirroring and plugins are taken into consideration but for most intents and purposes, all cluster nodes should be considered equal

2 - 如果一个队列发生故障，是否将相同的消息发送到每个队列以减轻损害？

队列镜像

By default, contents of a queue within a RabbitMQ cluster are located on a single node (the node on which the queue was declared). This is in contrast to exchanges and bindings, which can always be considered to be on all nodes. Queues can optionally be made mirrored across multiple nodes.

Each mirrored queue consists of one master and one or more mirrors. The master is hosted on one node commonly referred as the master node. Each queue has its own master node. All operations for a given queue are first applied on the queue's master node and then propagated to mirrors. This involves enqueueing publishes, delivering messages to consumers, tracking acknowledgements from consumers and so on.

Queue mirroring implies a cluster of nodes. It is therefore not recommended for use across a WAN (though of course, clients can still connect from as near and as far as needed).

Messages published to the queue are replicated to all mirrors. Consumers are connected to the master regardless of which node they connect to, with mirrors dropping messages that have been acknowledged at the master. Queue mirroring therefore enhances availability, but does not distribute load across nodes (all participating nodes each do all the work).

If the node that hosts queue master fails, the oldest mirror will be promoted to the new master as long as it synchronised. Unsynchronised mirrors can be promoted, too, depending on queue mirroring parameters.

There are multiple terms commonly used to identify primary and secondary replicas in a distributed system. This guide typically uses "master" to refer to the primary replica of a queue and "mirror" for secondary replicas. However, you will find "slave" used here and there. This is because RabbitMQ CLI tools historically have been using the term "slave" to refer to secondaries. Therefore both terms are currently used interchangeably but we'd like to eventually get rid of the legacy terminology.

更多信息：https://www.rabbitmq.com/clustering.html