Hyperledger Fabric 应用程序中的订购者断开连接
Orderer disconnections in a Hyperledger Fabric application
我们有一个超级账本应用程序。主应用程序托管在 AWS VM 上,而 DR 托管在 Azure VM 上。最近,Microsoft 团队发现其中一个 DR VM 变得不可用,并且在大约 8 分钟内恢复了可用性。根据 Microsoft "This unexpected occurrence was caused by an Azure initiated auto-recovery action. The auto-recovery action was triggered by a hardware issue on the physical node where the virtual machine was hosted. As designed, your VM was automatically moved to a different and healthy physical node to avoid further impact." Zookeeper VM 也在同一时间重新部署
此事件发生后的第二天,我们开始注意到一个订购者下线并在几秒钟后立即上线。这 disconnection/connection 在间隔 12 小时 10 分钟后定期发生。
我们注意到两件事
在日志中我们得到
- [orderer/consensus/kafka] startThread -> CRIT 24df#033[0m [channel:
testchainid] Cannot set up channel consumer = kafka server: The
requested offset is outside the range of offsets maintained by the
server for the given topic/partition.
- panic: [channel: testchainid] Cannot set up channel consumer = kafka
server: The requested offset is outside the range of offsets
maintained by the server for the given topic/partition.
- goroutine 52 [running]:
- github.com/hyperledger/fabric/vendor/github.com/op/go-logging.(*Logger).Panicf(0xc4202748a0,
0x108dede, 0x31, 0xc420327540, 0x2, 0x2)
- /w/workspace/fabric-binaries-x86_64/gopath/src/github.com/hyperledger/fabric/vendor/github.com/op/go-logging/logger.go:194
+0x134
- github.com/hyperledger/fabric/orderer/consensus/kafka.startThread(0xc42022cdc0)
- /w/workspace/fabric-binaries-x86_64/gopath/src/github.com/hyperledger/fabric/orderer/consensus/kafka/chain.go:261
+0xb33
- created by
github.com/hyperledger/fabric/orderer/consensus/kafka.(*chainImpl).Start
- /w/workspace/fabric-binaries-x86_64/gopath/src/github.com/hyperledger/fabric/orderer/consensus/kafka/chain.go:126
+0x3f
我们注意到的另一件事是,在 VM 故障事件之前的日志中有 3 个 kafka 代理,但在此事件之后我们只能在日志中看到 2 个 kafka 代理。
有人可以指导我吗?我该如何解决这个问题?
其他信息 - 我们查看了重新部署 VM 后那一天的 Kafka 日志,我们注意到以下内容
org.apache.kafka.common.network.InvalidReceiveException: Invalid receive (size = 1195725856 larger than 104857600)
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:132)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:93)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:231)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:192)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:528)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:469)
at org.apache.kafka.common.network.Selector.poll(Selector.java:398)
at kafka.network.Processor.poll(SocketServer.scala:535)
at kafka.network.Processor.run(SocketServer.scala:452)
at java.lang.Thread.run(Thread.java:748)
看来我们有解决方案,但需要验证。验证解决方案后,我将 post 在此站点上。
我们有一个超级账本应用程序。主应用程序托管在 AWS VM 上,而 DR 托管在 Azure VM 上。最近,Microsoft 团队发现其中一个 DR VM 变得不可用,并且在大约 8 分钟内恢复了可用性。根据 Microsoft "This unexpected occurrence was caused by an Azure initiated auto-recovery action. The auto-recovery action was triggered by a hardware issue on the physical node where the virtual machine was hosted. As designed, your VM was automatically moved to a different and healthy physical node to avoid further impact." Zookeeper VM 也在同一时间重新部署
此事件发生后的第二天,我们开始注意到一个订购者下线并在几秒钟后立即上线。这 disconnection/connection 在间隔 12 小时 10 分钟后定期发生。
我们注意到两件事
在日志中我们得到
- [orderer/consensus/kafka] startThread -> CRIT 24df#033[0m [channel:
testchainid] Cannot set up channel consumer = kafka server: The
requested offset is outside the range of offsets maintained by the
server for the given topic/partition.
- panic: [channel: testchainid] Cannot set up channel consumer = kafka
server: The requested offset is outside the range of offsets
maintained by the server for the given topic/partition.
- goroutine 52 [running]:
- github.com/hyperledger/fabric/vendor/github.com/op/go-logging.(*Logger).Panicf(0xc4202748a0,
0x108dede, 0x31, 0xc420327540, 0x2, 0x2)
- /w/workspace/fabric-binaries-x86_64/gopath/src/github.com/hyperledger/fabric/vendor/github.com/op/go-logging/logger.go:194
+0x134
- github.com/hyperledger/fabric/orderer/consensus/kafka.startThread(0xc42022cdc0)
- /w/workspace/fabric-binaries-x86_64/gopath/src/github.com/hyperledger/fabric/orderer/consensus/kafka/chain.go:261
+0xb33
- created by
github.com/hyperledger/fabric/orderer/consensus/kafka.(*chainImpl).Start
- /w/workspace/fabric-binaries-x86_64/gopath/src/github.com/hyperledger/fabric/orderer/consensus/kafka/chain.go:126
+0x3f
我们注意到的另一件事是,在 VM 故障事件之前的日志中有 3 个 kafka 代理,但在此事件之后我们只能在日志中看到 2 个 kafka 代理。
有人可以指导我吗?我该如何解决这个问题?
其他信息 - 我们查看了重新部署 VM 后那一天的 Kafka 日志,我们注意到以下内容
org.apache.kafka.common.network.InvalidReceiveException: Invalid receive (size = 1195725856 larger than 104857600)
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:132)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:93)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:231)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:192)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:528)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:469)
at org.apache.kafka.common.network.Selector.poll(Selector.java:398)
at kafka.network.Processor.poll(SocketServer.scala:535)
at kafka.network.Processor.run(SocketServer.scala:452)
at java.lang.Thread.run(Thread.java:748)
看来我们有解决方案,但需要验证。验证解决方案后,我将 post 在此站点上。