使用 pysolr 连接到 solr 云集合
Connection to solr cloud collection using pysolr
我配置了多核solr云。
创建了一个包含 2 个分片且没有复制的集合。
通过solr UI 192.168.1.56:8983,我能够得到查询的结果。
我想用 pysolr 做同样的事情,所以尝试了 运行 以下内容:
import pysolr
zookeeper = pysolr.ZooKeeper("192.168.1.56:2181,192.168.1.55:2182")
solr = pysolr.SolrCloud(zookeeper, "random_collection")
最后一行无法找到该集合,即使它在那里。
下面是错误跟踪:
---------------------------------------------------------------------------
SolrError Traceback (most recent call last)
<ipython-input-43-9f03eca3b645> in <module>()
----> 1 solr = pysolr.SolrCloud(zookeeper, "patent_colllection")
/usr/local/lib/python2.7/dist-packages/pysolr.pyc in __init__(self, zookeeper, collection, decoder, timeout, retry_timeout, *args, **kwargs)
1176
1177 def __init__(self, zookeeper, collection, decoder=None, timeout=60, retry_timeout=0.2, *args, **kwargs):
-> 1178 url = zookeeper.getRandomURL(collection)
1179
1180 super(SolrCloud, self).__init__(url, decoder=decoder, timeout=timeout, *args, **kwargs)
/usr/local/lib/python2.7/dist-packages/pysolr.pyc in getRandomURL(self, collname, only_leader)
1315
1316 def getRandomURL(self, collname, only_leader=False):
-> 1317 hosts = self.getHosts(collname, only_leader=only_leader)
1318 if not hosts:
1319 raise SolrError('ZooKeeper returned no active shards!')
/usr/local/lib/python2.7/dist-packages/pysolr.pyc in getHosts(self, collname, only_leader, seen_aliases)
1281 hosts = []
1282 if collname not in self.collections:
-> 1283 raise SolrError("Unknown collection: %s", collname)
1284 collection = self.collections[collname]
1285 shards = collection[ZooKeeper.SHARDS]
SolrError: (u'Unknown collection: %s', 'random_collection')
Solr版本为6.6.2,zookeeper版本为3.4.10
如何创建到solr云集合的连接?
Pysolr 目前不支持外部zookeeper 集群。 Pysolr 检查 clusterstate.json 中的集合,Solr 为每个集群即兴创作了 state.json,并且 clusterstate.json 留空。
要解决单个集合的问题,您可以在 pysolr.py 中硬编码 ZooKeeper.CLUSTER_STATE 变量,如下所示:
ZooKeeper.CLUSTER_STATE = '/collections/random_collection/state.json'
pysolr.py 可以在 /usr/local/lib/python2.7/dist-packages 找到,也许可以尝试用
重新安装它
pip install -e /usr/local/lib/python2.7/dist-packages/pysolr.py
更好的技巧是以通用方式提供这些集合:
import pysolr
import json
zookeeper = pysolr.ZooKeeper("ZK_STRING")
collections = {}
for c in zookeeper.zk.get_children("collections"):
collections.update(json.loads(zookeeper.zk.get("collections/{}/state.json".format(c))[0].decode("ascii")))
zookeeper.collections = collections
即使对于 SolrCloud,常规 HTTP 客户端也能正常工作。
已使用 Solr 7.5 和 PySolr 3.9.0 进行测试:
import pysolr
solr_url="https://my.solr.url"
collection = "my_collection"
solr_connection = pysolr.Solr("{}/solr/{}".format(solr_url, collection), timeout=10)
results = solr_connection.search(...)
print(results.docs)
我配置了多核solr云。
创建了一个包含 2 个分片且没有复制的集合。
通过solr UI 192.168.1.56:8983,我能够得到查询的结果。
我想用 pysolr 做同样的事情,所以尝试了 运行 以下内容:
import pysolr
zookeeper = pysolr.ZooKeeper("192.168.1.56:2181,192.168.1.55:2182")
solr = pysolr.SolrCloud(zookeeper, "random_collection")
最后一行无法找到该集合,即使它在那里。 下面是错误跟踪:
---------------------------------------------------------------------------
SolrError Traceback (most recent call last)
<ipython-input-43-9f03eca3b645> in <module>()
----> 1 solr = pysolr.SolrCloud(zookeeper, "patent_colllection")
/usr/local/lib/python2.7/dist-packages/pysolr.pyc in __init__(self, zookeeper, collection, decoder, timeout, retry_timeout, *args, **kwargs)
1176
1177 def __init__(self, zookeeper, collection, decoder=None, timeout=60, retry_timeout=0.2, *args, **kwargs):
-> 1178 url = zookeeper.getRandomURL(collection)
1179
1180 super(SolrCloud, self).__init__(url, decoder=decoder, timeout=timeout, *args, **kwargs)
/usr/local/lib/python2.7/dist-packages/pysolr.pyc in getRandomURL(self, collname, only_leader)
1315
1316 def getRandomURL(self, collname, only_leader=False):
-> 1317 hosts = self.getHosts(collname, only_leader=only_leader)
1318 if not hosts:
1319 raise SolrError('ZooKeeper returned no active shards!')
/usr/local/lib/python2.7/dist-packages/pysolr.pyc in getHosts(self, collname, only_leader, seen_aliases)
1281 hosts = []
1282 if collname not in self.collections:
-> 1283 raise SolrError("Unknown collection: %s", collname)
1284 collection = self.collections[collname]
1285 shards = collection[ZooKeeper.SHARDS]
SolrError: (u'Unknown collection: %s', 'random_collection')
Solr版本为6.6.2,zookeeper版本为3.4.10
如何创建到solr云集合的连接?
Pysolr 目前不支持外部zookeeper 集群。 Pysolr 检查 clusterstate.json 中的集合,Solr 为每个集群即兴创作了 state.json,并且 clusterstate.json 留空。
要解决单个集合的问题,您可以在 pysolr.py 中硬编码 ZooKeeper.CLUSTER_STATE 变量,如下所示:
ZooKeeper.CLUSTER_STATE = '/collections/random_collection/state.json'
pysolr.py 可以在 /usr/local/lib/python2.7/dist-packages 找到,也许可以尝试用
重新安装它pip install -e /usr/local/lib/python2.7/dist-packages/pysolr.py
更好的技巧是以通用方式提供这些集合:
import pysolr
import json
zookeeper = pysolr.ZooKeeper("ZK_STRING")
collections = {}
for c in zookeeper.zk.get_children("collections"):
collections.update(json.loads(zookeeper.zk.get("collections/{}/state.json".format(c))[0].decode("ascii")))
zookeeper.collections = collections
即使对于 SolrCloud,常规 HTTP 客户端也能正常工作。
已使用 Solr 7.5 和 PySolr 3.9.0 进行测试:
import pysolr
solr_url="https://my.solr.url"
collection = "my_collection"
solr_connection = pysolr.Solr("{}/solr/{}".format(solr_url, collection), timeout=10)
results = solr_connection.search(...)
print(results.docs)