无法通过 SSH 启动 dask 集群
Cannot start dask cluster over SSH
我正在尝试通过 SSH 启动 dask 集群,但我遇到了如下奇怪的错误:
Exception in thread Thread-6:
Traceback (most recent call last):
File "/home/localuser/miniconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/localuser/miniconda3/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/localuser/miniconda3/lib/python3.6/site-packages/distributed/deploy/ssh.py", line 57, in async_ssh
banner_timeout=20) # Helps prevent timeouts when many concurrent ssh connections are opened.
File "/home/localuser/miniconda3/lib/python3.6/site-packages/paramiko/client.py", line 329, in connect
to_try = list(self._families_and_addresses(hostname, port))
File "/home/localuser/miniconda3/lib/python3.6/site-packages/paramiko/client.py", line 200, in _families_and_addresses
hostname, port, socket.AF_UNSPEC, socket.SOCK_STREAM)
File "/home/localuser/miniconda3/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
我是这样启动集群的:
$ dask-ssh --ssh-private-key ~/.ssh/cluster_id_rsa \
--hostfile ~/dask-hosts.txt \
--remote-python "~/miniconda3/bin/python3.6"
我的 dask-hosts.txt
看起来像这样:
localuser@127.0.0.1
remoteuser@10.10.4.200
...
remoteuser@10.10.4.207
我得到同样的错误 with/without localhost 行。
我已经检查了 ssh 设置,我可以使用 public 密钥设置登录到所有节点(密钥未加密,以避免出现解密提示)。我错过了什么?
该错误表明名称解析是罪魁祸首。发生这种情况很可能是因为您的 dask-hosts.txt
中包含了用户名。根据 its documentation,主机文件应仅包含 hostnames/IP 个地址:
–hostfile PATH Textfile with hostnames/IP addresses
您可以使用--ssh-username
设置用户名(虽然只有一个)。
我正在尝试通过 SSH 启动 dask 集群,但我遇到了如下奇怪的错误:
Exception in thread Thread-6:
Traceback (most recent call last):
File "/home/localuser/miniconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/home/localuser/miniconda3/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/localuser/miniconda3/lib/python3.6/site-packages/distributed/deploy/ssh.py", line 57, in async_ssh
banner_timeout=20) # Helps prevent timeouts when many concurrent ssh connections are opened.
File "/home/localuser/miniconda3/lib/python3.6/site-packages/paramiko/client.py", line 329, in connect
to_try = list(self._families_and_addresses(hostname, port))
File "/home/localuser/miniconda3/lib/python3.6/site-packages/paramiko/client.py", line 200, in _families_and_addresses
hostname, port, socket.AF_UNSPEC, socket.SOCK_STREAM)
File "/home/localuser/miniconda3/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
我是这样启动集群的:
$ dask-ssh --ssh-private-key ~/.ssh/cluster_id_rsa \
--hostfile ~/dask-hosts.txt \
--remote-python "~/miniconda3/bin/python3.6"
我的 dask-hosts.txt
看起来像这样:
localuser@127.0.0.1
remoteuser@10.10.4.200
...
remoteuser@10.10.4.207
我得到同样的错误 with/without localhost 行。
我已经检查了 ssh 设置,我可以使用 public 密钥设置登录到所有节点(密钥未加密,以避免出现解密提示)。我错过了什么?
该错误表明名称解析是罪魁祸首。发生这种情况很可能是因为您的 dask-hosts.txt
中包含了用户名。根据 its documentation,主机文件应仅包含 hostnames/IP 个地址:
–hostfile PATH Textfile with hostnames/IP addresses
您可以使用--ssh-username
设置用户名(虽然只有一个)。