使用 cloudbreak 删除节点后,Ambari 服务器不会重新启动
Ambari server doesn't restart after removing node with cloudbreak
添加一个节点以测试缩放然后使用 cloudbreak 删除该节点后,服务 ambari-server 不会重新启动。
启动时的错误是:
DB configs consistency check failed. Run "ambari-server start --skip-database-check" to skip. You may try --auto-fix-database flag to attempt to fix issues automatically. If you use this "--skip-database-check" option, do not make any changes to your cluster topology or perform a cluster upgrade until you correct the database consistency issues. See /var/log/ambari-server/ambari-server-check-database.log for more details on the consistency issues.
看日志就不多说了。我尝试重新启动 postgres,有时它可以工作,比如 1 次 10 次(这怎么可能?)
我进行了更深入的推理,而不仅仅是重新启动 postgres。
我打开了 ambari table 来查看:
sudo su - postgres
psql ambari -U ambari -W -p 5432
(密码是bigdata)
当我查看 tables topology_logical_request、topology_request 和 topology_hostgroup,我看到集群注册了一个remove请求,只有一个adding请求:
ambari=> select * from topology_logical_request;
id | request_id | description
----+------------+-----------------------------------------------------------
1 | 1 | Logical Request: Provision Cluster 'sentelab-perf'
62 | 51 | Logical Request: Scale Cluster 'sentelab-perf' (+1 hosts)
检查要删除的id(跟踪所有添加节点操作的请求)并开始删除它们(顺序很重要):
delete from topology_hostgroup where id = 51;
delete from topology_logical_request where id = 62;
DELETE FROM topology_request WHERE id = 51;
用 \q 关闭,重启 ambari-server,就可以了!
添加一个节点以测试缩放然后使用 cloudbreak 删除该节点后,服务 ambari-server 不会重新启动。
启动时的错误是:
DB configs consistency check failed. Run "ambari-server start --skip-database-check" to skip. You may try --auto-fix-database flag to attempt to fix issues automatically. If you use this "--skip-database-check" option, do not make any changes to your cluster topology or perform a cluster upgrade until you correct the database consistency issues. See /var/log/ambari-server/ambari-server-check-database.log for more details on the consistency issues.
看日志就不多说了。我尝试重新启动 postgres,有时它可以工作,比如 1 次 10 次(这怎么可能?)
我进行了更深入的推理,而不仅仅是重新启动 postgres。
我打开了 ambari table 来查看:
sudo su - postgres psql ambari -U ambari -W -p 5432 (密码是bigdata)
当我查看 tables topology_logical_request、topology_request 和 topology_hostgroup,我看到集群注册了一个remove请求,只有一个adding请求:
ambari=> select * from topology_logical_request;
id | request_id | description
----+------------+-----------------------------------------------------------
1 | 1 | Logical Request: Provision Cluster 'sentelab-perf'
62 | 51 | Logical Request: Scale Cluster 'sentelab-perf' (+1 hosts)
检查要删除的id(跟踪所有添加节点操作的请求)并开始删除它们(顺序很重要):
delete from topology_hostgroup where id = 51;
delete from topology_logical_request where id = 62;
DELETE FROM topology_request WHERE id = 51;
用 \q 关闭,重启 ambari-server,就可以了!