如何在 Solr Streaming 中使用 Rank over partition by

How to use Rank over partition by in Solr Streaming

如何在 Solr Streaming 中使用 Rank over partition by。

Table A (city, streetname),需要如下具有等级 2 的查询, city1 - streetname1, streetname2 city2 - streetname1, streetname2 city3 - streetname1, streetname2

有没有什么功能支持上面的?

提前致谢。

您可以使用reducetop函数来实现您想要的。这些函数在从 search 返回的流上工作,并允许您创建与 group 在常规查询中具有的功能相同的功能。

reduce

The reduce function wraps an internal stream and groups tuples by common fields.

Each tuple group is operated on as a single block by a pluggable reduce operation. The group operation provided with Solr implements distributed grouping functionality. The group operation also serves as an example reduce operation that can be referred to when building custom reduce operations.

The reduce function relies on the sort order of the underlying stream. Accordingly the sort order of the underlying stream must be aligned with the group by field.

top

The top function wraps a streaming expression and re-orders the tuples. The top function emits only the top N tuples in the new sort order. The top function re-orders the underlying stream so the sort criteria does not have to match up with the underlying stream.

search 流源不接受常规查询接受的所有参数 - 但有其自己的允许参数子集:

search Parameters

collection: (Mandatory) the collection being searched.

q: (Mandatory) The query to perform on the Solr index.

fl: (Mandatory) The list of fields to return.

sort: (Mandatory) The sort criteria.

zkHost: Only needs to be defined if the collection being searched is found in a different zkHost than the local stream handler.

qt: Specifies the query type, or request handler, to use. Set this to /export to work with large result sets. The default is /select.

rows: (Mandatory with the /select handler) The rows parameter specifies how many rows to return. This parameter is only needed with the /select handler (which is the default) since the /export handler always returns all rows.

partitionKeys: Comma delimited list of keys to partition the search results by. To be used with the parallel function for parallelizing operations across worker nodes. See the parallel function for details.