使用 cloudbreak 删除节点后,Ambari 服务器不会重新启动

Ambari server doesn't restart after removing node with cloudbreak

添加一个节点以测试缩放然后使用 cloudbreak 删除该节点后,服务 ambari-server 不会重新启动。

启动时的错误是:

DB configs consistency check failed. Run "ambari-server start --skip-database-check" to skip. You may try --auto-fix-database flag to attempt to fix issues automatically. If you use this "--skip-database-check" option, do not make any changes to your cluster topology or perform a cluster upgrade until you correct the database consistency issues. See /var/log/ambari-server/ambari-server-check-database.log for more details on the consistency issues.

看日志就不多说了。我尝试重新启动 postgres,有时它可以工作,比如 1 次 10 次(这怎么可能?)

我进行了更深入的推理,而不仅仅是重新启动 postgres。

我打开了 ambari table 来查看:

sudo su - postgres psql ambari -U ambari -W -p 5432 (密码是bigdata)

当我查看 tables topology_logical_requesttopology_requesttopology_hostgroup,我看到集群注册了一个remove请求,只有一个adding请求:

ambari=> select * from topology_logical_request;
 id | request_id |                        description
----+------------+-----------------------------------------------------------
  1 |          1 | Logical Request: Provision Cluster 'sentelab-perf'
 62 |         51 | Logical Request: Scale Cluster 'sentelab-perf' (+1 hosts)

检查要删除的id(跟踪所有添加节点操作的请求)并开始删除它们(顺序很重要):

delete from topology_hostgroup where id = 51;
delete from topology_logical_request where id = 62;
DELETE FROM topology_request WHERE id = 51;

用 \q 关闭,重启 ambari-server,就可以了!