如何处理 etcdserver:不健康的集群
how to handle etcdserver: unhealthy cluster
当我使用此命令在 etcd 集群的主节点中添加节点时:
curl http://127.0.0.1:2379/v3beta/members \
-XPOST -H "Content-Type: application/json" \
-d '{"peerURLs": ["http://172.19.104.230:2380"]}'
显示{"error":"etcdserver: unhealthy cluster","code":14}
。
然后我检查集群状态:
[root@iZuf63refzweg1d9dh94t8Z ~]# etcdctl member list
55a782166ce91d01, started, infra3, https://172.19.150.82:2380, https://172.19.150.82:2379
696a771758a889c4, started, infra1, https://172.19.104.231:2380, https://172.19.104.231:2379
很好。我应该怎么做才能让它发挥作用?
按照etcd
source code, it returns ErrUnhealthy
error code if longestConnected
方法失败。
// longestConnected chooses the member with longest active-since-time.
// It returns false, if nothing is active.
func longestConnected(tp rafthttp.Transporter, membs []types.ID) (types.ID, bool) {
var longest types.ID
var oldest time.Time
for _, id := range membs {
tm := tp.ActiveSince(id)
if tm.IsZero() { // inactive
continue
}
if oldest.IsZero() { // first longest candidate
oldest = tm
longest = id
}
if tm.Before(oldest) {
oldest = tm
longest = id
}
}
if uint64(longest) == 0 {
return longest, false
}
return longest, true
}
因此,ectd
找不到合适的成员进行连接。
集群的方法 VotingMemberIDs
returns 投票 成员列表:
transferee, ok := longestConnected(s.r.transport, s.cluster.VotingMemberIDs())
if !ok {
return ErrUnhealthy
}
// VotingMemberIDs returns the ID of voting members in cluster.
func (c *RaftCluster) VotingMemberIDs() []types.ID {
c.Lock()
defer c.Unlock()
var ids []types.ID
for _, m := range c.members {
if !m.IsLearner {
ids = append(ids, m.ID)
}
}
sort.Sort(types.IDSlice(ids))
return ids
}
正如我们从您的报告中看到的那样,您的集群中有 个成员。
$ etcdctl member list
> 55a782166ce91d01, started, infra3, https://172.19.150.82:2380, https://172.19.150.82:2379
> 696a771758a889c4, started, infra1, https://172.19.104.231:2380, https://172.19.104.231:2379
所以我们应该检查成员 - 他们是投票成员,而不是 learners
, see etcd docs | Learner
// RaftAttributes represents the raft related attributes of an etcd member.
type RaftAttributes struct {
// PeerURLs is the list of peers in the raft cluster.
// TODO(philips): ensure these are URLs
PeerURLs []string `json:"peerURLs"`
// IsLearner indicates if the member is raft learner.
IsLearner bool `json:"isLearner,omitempty"`
}
因此,尝试增加成员数量以提供 quorum
强制创建成员try thisETCD_FORCE_NEW_CLUSTER=“true"
法定人数
当我使用此命令在 etcd 集群的主节点中添加节点时:
curl http://127.0.0.1:2379/v3beta/members \
-XPOST -H "Content-Type: application/json" \
-d '{"peerURLs": ["http://172.19.104.230:2380"]}'
显示{"error":"etcdserver: unhealthy cluster","code":14}
。
然后我检查集群状态:
[root@iZuf63refzweg1d9dh94t8Z ~]# etcdctl member list
55a782166ce91d01, started, infra3, https://172.19.150.82:2380, https://172.19.150.82:2379
696a771758a889c4, started, infra1, https://172.19.104.231:2380, https://172.19.104.231:2379
很好。我应该怎么做才能让它发挥作用?
按照etcd
source code, it returns ErrUnhealthy
error code if longestConnected
方法失败。
// longestConnected chooses the member with longest active-since-time.
// It returns false, if nothing is active.
func longestConnected(tp rafthttp.Transporter, membs []types.ID) (types.ID, bool) {
var longest types.ID
var oldest time.Time
for _, id := range membs {
tm := tp.ActiveSince(id)
if tm.IsZero() { // inactive
continue
}
if oldest.IsZero() { // first longest candidate
oldest = tm
longest = id
}
if tm.Before(oldest) {
oldest = tm
longest = id
}
}
if uint64(longest) == 0 {
return longest, false
}
return longest, true
}
因此,ectd
找不到合适的成员进行连接。
集群的方法 VotingMemberIDs
returns 投票 成员列表:
transferee, ok := longestConnected(s.r.transport, s.cluster.VotingMemberIDs())
if !ok {
return ErrUnhealthy
}
// VotingMemberIDs returns the ID of voting members in cluster.
func (c *RaftCluster) VotingMemberIDs() []types.ID {
c.Lock()
defer c.Unlock()
var ids []types.ID
for _, m := range c.members {
if !m.IsLearner {
ids = append(ids, m.ID)
}
}
sort.Sort(types.IDSlice(ids))
return ids
}
正如我们从您的报告中看到的那样,您的集群中有 个成员。
$ etcdctl member list > 55a782166ce91d01, started, infra3, https://172.19.150.82:2380, https://172.19.150.82:2379 > 696a771758a889c4, started, infra1, https://172.19.104.231:2380, https://172.19.104.231:2379
所以我们应该检查成员 - 他们是投票成员,而不是 learners
, see etcd docs | Learner
// RaftAttributes represents the raft related attributes of an etcd member.
type RaftAttributes struct {
// PeerURLs is the list of peers in the raft cluster.
// TODO(philips): ensure these are URLs
PeerURLs []string `json:"peerURLs"`
// IsLearner indicates if the member is raft learner.
IsLearner bool `json:"isLearner,omitempty"`
}
因此,尝试增加成员数量以提供 quorum
强制创建成员try thisETCD_FORCE_NEW_CLUSTER=“true"