如果 group cordinator 长时间不调用 poll() 是否会认为 kafka 消费者 (0.9) 已死?

Will group cordinator treat kafka consumer (0.9) dead if it doesn't call poll() for a very long time?

https://www.safaribooksonline.com/library/view/kafka-the-definitive/9781491936153/ch04.html 提到 "As long the consumer is sending heartbeats in regular intervals, it is assumed to be alive, well and processing messages from its partitions. In fact, the act of polling for messages is what causes the consumer to send those heartbeats. If the consumer stops sending heartbeats for long enough, its session will time out and the group coordinator will consider it dead and trigger a rebalance."

同样https://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html指定"The broker will automatically detect failed processes in the test group by using a heartbeat mechanism. The consumer will automatically ping the cluster periodically, which lets the cluster know that it is alive. As long as the consumer is able to do this it is considered alive and retains the right to consume from the partitions assigned to it. If it stops heartbeating for a period of time longer than session.timeout.ms then it will be considered dead and its partitions will be assigned to another process. "

在我的应用程序中,处理从前一个 poll() 接收到的消息最多可能需要几个小时,然后才会调用另一个 poll()。注意:我禁用了自动提交,因为我并不总是知道处理所有以前的消息需要多长时间。

a) 这会导致组协调器认为消费者已死亡或不活跃吗?

b) 是否有其他方法可以向组协调器发送心跳消息以保持会话活动?

c) session.timeout.ms 对保持消费者 alive/active 有什么影响吗?

a) 是的,如果你不调用 poll() 的时间超过 session.timeout.ms Kafka 认为消费者已经死了。

b) 作为替代方案,您可以在处理期间调用 poll()(即与处理交错)以触发心跳(并在每个 "real" 轮询之前查找)。使用额外的处理线程也是可能的,允许主线程定期轮询以发送心跳。 但是,您需要确保检测到处理线程上的故障(正确操作的技巧)!

c) 你可以增加超时值,但是,这可能不是你想要的,就好像你的消费者失败了,这个失败很晚才被检测到。

你描述的问题其实是已知的,未来消费者行为可能会改变。已经有关于它的讨论。有关详细信息,请参阅 KIP-62

更新

因为 Kafka 0.10.1 消费者有两个配置参数:max.poll.interval.mssession.timeout.ms。第一个是两次连续轮询之间的最长时间,第二个是心跳超时。心跳在一个额外的线程中发送,因此现在与调用 poll() 分离。因此,增加 max.poll.interval.ms 不会产生无法快速检测到整个客户端故障(无心跳)的负面影响。