如何将搜索词捆绑到更有效的查询中?
How can I bundle search terms into more efficient queries?
我需要将搜索词列表转换为最有效的组合搜索词集。任何单词或引用的短语都可以用 OR 分隔。许多术语可以组合在括号内。也可以使用 AND。
例如,foo bar
和 boo bar
共享 bar
,因此可以合并为 (foo OR boo) AND bar
.
而不是两个不同的搜索词
这是算法需要做的事情。鉴于此数据集:
foo bar
boo bar
goo bar
hoo doo
foo manchu
moo bar
too bar
foo fighters
"blue kazoo" bar
baz
qux
quux
我想得到以下回复:
(foo OR boo OR goo OR moo OR too OR "blue kazoo") AND bar
foo AND (manchu OR fighters)
hoo doo
baz OR qux OR quux
这不起作用:
(foo bar) OR (boo bar) OR (goo bar) OR (foo manchu)
我将在 PHP 中工作,但我会使用伪代码回答,PHP 或者我将从主要语言转换。
我理解逻辑,但你真的需要把问题说清楚。
无论如何,我认为这是一个图形问题,我们想要找到具有最高度数并且可以跨越整个图形的节点集。
我相信如果你这样想,你可以使用任何你喜欢的数据结构来达到目的。您可以创建一个邻接列表,然后找到度数更高的节点,然后检查是否所有元素都被这些节点覆盖。加AND、OR的事情后面就简单了
我得到了以下代码:
function keyMultiSort(&$array,
$key,
$reverse = false,
$priority_last = false,
$save_key = true,
Callable $func = null)
{
if ($func === null)
{
$func = function ($first, $second) use ($key, $reverse, $priority_last)
{
if (!isset($first[$key]))
{
return ($reverse === false) ? -1 : 1;
}
if (!isset($second[$key]))
{
return ($reverse === false) ? 1 : -1;
}
if ($first[$key] > $second[$key])
{
return ($reverse === false) ? 1 : -1;
}
if ($first[$key] < $second[$key])
{
return ($reverse === false) ? -1 : 1;
}
if ($first[$key] === $second[$key])
{
return ($priority_last === false) ? 1 : -1;
}
return 0;
};
}
if ($save_key)
{
uasort($array, $func);
}
else
{
usort($array, $func);
}
}
$array = [
['foo', 'bar'],
['boo', 'bar'],
['goo', 'bar'],
['hoo', 'doo'],
['foo', 'manchu'],
['moo', 'bar'],
['too', 'bar'],
['foo', 'fighters'],
['blue kazoo', 'bar'],
];
$pairs = [];
$str = '';
foreach($array as $item)
{
if(!isset($pairs[$item[0]]['count']))
{
$pairs[$item[0]]['count'] = 1;
}
else
{
$pairs[$item[0]]['count']++;
}
$pairs[$item[0]]['elements'][] = $item[1];
if(!isset($pairs[$item[1]]['count']))
{
$pairs[$item[1]]['count'] = 1;
}
else
{
$pairs[$item[1]]['count']++;
}
$pairs[$item[1]]['elements'][] = $item[0];
keyMultiSort($pairs, 'count', true);
}
$remove = [];
foreach($pairs as $elm=>$item)
{
$remove[] = $elm;
$elements = array_diff($item['elements'], $remove);
if(empty($elements))
{
if (in_array($elm, $remove))
{
continue;
}
$str .= $elm.PHP_EOL;
}
else
{
$str .= $elm.' AND ('.implode(' OR ', $elements).')'.PHP_EOL;
}
$remove = array_merge($remove, $elements);
}
var_dump($str);
结果:
string(99) "bar AND (foo OR boo OR goo OR moo OR too OR blue kazoo)
foo AND (manchu OR fighters)
hoo AND (doo)
"
可以根据目标进行优化...
处理超过2个值的代码
<?php
function keyMultiSort(&$array,
$key,
$reverse = false,
$priority_last = false,
$save_key = true,
Callable $func = null)
{
if ($func === null)
{
$func = function ($first, $second) use ($key, $reverse, $priority_last)
{
if (!isset($first[$key]))
{
return ($reverse === false) ? -1 : 1;
}
if (!isset($second[$key]))
{
return ($reverse === false) ? 1 : -1;
}
if ($first[$key] > $second[$key])
{
return ($reverse === false) ? 1 : -1;
}
if ($first[$key] < $second[$key])
{
return ($reverse === false) ? -1 : 1;
}
if ($first[$key] === $second[$key])
{
return ($priority_last === false) ? 1 : -1;
}
return 0;
};
}
if ($save_key)
{
uasort($array, $func);
}
else
{
usort($array, $func);
}
}
$array = [
['foo', 'bar', 'test'],
['boo', 'bar'],
['goo', 'bar'],
['hoo', 'doo', 'test', 'test2'],
['foo', 'manchu'],
['moo', 'bar'],
['too', 'bar'],
['foo', 'fighters'],
['blue kazoo', 'bar', 'test'],
];
$pairs = [];
$str = '';
foreach($array as $item)
{
foreach($item as $key=>$elm)
{
foreach($item as $key2=>$elm2)
{
if($key !== $key2)
{
if(!isset($pairs[$elm]['count']))
{
$pairs[$elm]['count'] = 1;
}
else
{
$pairs[$elm]['count']++;
}
$pairs[$elm]['elements'][] = $elm2;
}
}
}
keyMultiSort($pairs, 'count', true);
}
//var_dump($pairs);
$remove = [];
foreach($pairs as $elm=>$item)
{
$remove[] = $elm;
$elements = array_diff($item['elements'], $remove);
if(empty($elements))
{
if (in_array($elm, $remove))
{
continue;
}
$str .= $elm.PHP_EOL;
}
else
{
$str .= $elm.' AND ('.implode(' OR ', array_unique($elements)).')'.PHP_EOL;
}
}
var_dump($str);
响应:
string(184) "bar AND (foo OR test OR boo OR goo OR moo OR too OR blue kazoo)
test AND (foo OR hoo OR doo OR test2 OR blue kazoo)
foo AND (manchu OR fighters)
hoo AND (doo OR test2)
doo AND (test2)
"
P.S.希望我已经正确理解了任务...
更新
添加了不忽略 "single values" 的代码。
我改变了逻辑:
...
['"yellow balloon"', 'foo', 'bar', 'baz', 'qut'],
...
return:
...
qut AND ("yellow balloon" OR baz)
baz AND ("yellow balloon")
...
在我看来,对于这个任务,这是正确的(条件是组合超过 2 个值)。
function keyMultiSort(&$array,
$key,
$reverse = false,
$priority_last = false,
$save_key = true,
Callable $func = null)
{
if ($func === null)
{
$func = function ($first, $second) use ($key, $reverse, $priority_last)
{
if (!isset($first[$key]))
{
return ($reverse === false) ? -1 : 1;
}
if (!isset($second[$key]))
{
return ($reverse === false) ? 1 : -1;
}
if ($first[$key] > $second[$key])
{
return ($reverse === false) ? 1 : -1;
}
if ($first[$key] < $second[$key])
{
return ($reverse === false) ? -1 : 1;
}
if ($first[$key] === $second[$key])
{
return ($priority_last === false) ? 1 : -1;
}
return 0;
};
}
if ($save_key)
{
uasort($array, $func);
}
else
{
usort($array, $func);
}
}
$array = [
['foo', 'bar', 'test'],
['boo', 'bar'],
['goo', 'bar'],
['hoo', 'doo', 'test', 'test2'],
['foo', 'manchu'],
['moo', 'bar'],
['too', 'bar'],
['foo', 'fighters'],
['"blue kazoo"', 'bar', 'test'],
['"red panda"', 'bar', 'test'],
['"yellow balloon"', 'foo', 'bar', 'baz', 'qut'],
['"red panda"', 'fighters', 'moo'],
['"foo fighters"'],
['foo'],
['bar'],
];
$pairs = [];
$singles = [];
$str = '';
foreach ($array as $item)
{
foreach ($item as $key => $elm)
{
if(count($item) === 1)
{
$singles[$elm] = 1;
}
else
{
if (!isset($pairs[$elm]))
{
$pairs[$elm]['count'] = 0;
$pairs[$elm]['elements'] = [];
}
foreach ($item as $key2 => $elm2)
{
if ($key !== $key2)
{
$pairs[$elm]['count']++;
$pairs[$elm]['elements'][] = $elm2;
}
}
}
}
keyMultiSort($pairs, 'count', true);
}
//var_dump($pairs);exit;
$remove = [];
foreach ($pairs as $elm => $item)
{
$remove[] = $elm;
$elements = array_diff($item['elements'], $remove);
$elements = array_unique($elements);
if (!empty($elements)){
$str .= $elm.' AND ('.implode(' OR ', $elements).')'.PHP_EOL;
}
}
foreach ($singles as $elm => $item)
{
$str .= $elm.PHP_EOL;
}
var_dump($str);
回复:
string(421) "bar AND (foo OR test OR boo OR goo OR moo OR too OR "blue kazoo" OR "red panda" OR "yellow balloon" OR baz OR qut)
test AND (foo OR hoo OR doo OR test2 OR "blue kazoo" OR "red panda")
foo AND (manchu OR fighters OR "yellow balloon" OR baz OR qut)
"red panda" AND (fighters OR moo)
qut AND ("yellow balloon" OR baz)
baz AND ("yellow balloon")
test2 AND (hoo OR doo)
fighters AND (moo)
doo AND (hoo)
"foo fighters"
foo
bar
"
P.S。在我看来,这个问题并不适用于现实
我需要将搜索词列表转换为最有效的组合搜索词集。任何单词或引用的短语都可以用 OR 分隔。许多术语可以组合在括号内。也可以使用 AND。
例如,foo bar
和 boo bar
共享 bar
,因此可以合并为 (foo OR boo) AND bar
.
这是算法需要做的事情。鉴于此数据集:
foo bar
boo bar
goo bar
hoo doo
foo manchu
moo bar
too bar
foo fighters
"blue kazoo" bar
baz
qux
quux
我想得到以下回复:
(foo OR boo OR goo OR moo OR too OR "blue kazoo") AND bar
foo AND (manchu OR fighters)
hoo doo
baz OR qux OR quux
这不起作用:
(foo bar) OR (boo bar) OR (goo bar) OR (foo manchu)
我将在 PHP 中工作,但我会使用伪代码回答,PHP 或者我将从主要语言转换。
我理解逻辑,但你真的需要把问题说清楚。
无论如何,我认为这是一个图形问题,我们想要找到具有最高度数并且可以跨越整个图形的节点集。
我相信如果你这样想,你可以使用任何你喜欢的数据结构来达到目的。您可以创建一个邻接列表,然后找到度数更高的节点,然后检查是否所有元素都被这些节点覆盖。加AND、OR的事情后面就简单了
我得到了以下代码:
function keyMultiSort(&$array,
$key,
$reverse = false,
$priority_last = false,
$save_key = true,
Callable $func = null)
{
if ($func === null)
{
$func = function ($first, $second) use ($key, $reverse, $priority_last)
{
if (!isset($first[$key]))
{
return ($reverse === false) ? -1 : 1;
}
if (!isset($second[$key]))
{
return ($reverse === false) ? 1 : -1;
}
if ($first[$key] > $second[$key])
{
return ($reverse === false) ? 1 : -1;
}
if ($first[$key] < $second[$key])
{
return ($reverse === false) ? -1 : 1;
}
if ($first[$key] === $second[$key])
{
return ($priority_last === false) ? 1 : -1;
}
return 0;
};
}
if ($save_key)
{
uasort($array, $func);
}
else
{
usort($array, $func);
}
}
$array = [
['foo', 'bar'],
['boo', 'bar'],
['goo', 'bar'],
['hoo', 'doo'],
['foo', 'manchu'],
['moo', 'bar'],
['too', 'bar'],
['foo', 'fighters'],
['blue kazoo', 'bar'],
];
$pairs = [];
$str = '';
foreach($array as $item)
{
if(!isset($pairs[$item[0]]['count']))
{
$pairs[$item[0]]['count'] = 1;
}
else
{
$pairs[$item[0]]['count']++;
}
$pairs[$item[0]]['elements'][] = $item[1];
if(!isset($pairs[$item[1]]['count']))
{
$pairs[$item[1]]['count'] = 1;
}
else
{
$pairs[$item[1]]['count']++;
}
$pairs[$item[1]]['elements'][] = $item[0];
keyMultiSort($pairs, 'count', true);
}
$remove = [];
foreach($pairs as $elm=>$item)
{
$remove[] = $elm;
$elements = array_diff($item['elements'], $remove);
if(empty($elements))
{
if (in_array($elm, $remove))
{
continue;
}
$str .= $elm.PHP_EOL;
}
else
{
$str .= $elm.' AND ('.implode(' OR ', $elements).')'.PHP_EOL;
}
$remove = array_merge($remove, $elements);
}
var_dump($str);
结果:
string(99) "bar AND (foo OR boo OR goo OR moo OR too OR blue kazoo)
foo AND (manchu OR fighters)
hoo AND (doo)
"
可以根据目标进行优化...
处理超过2个值的代码
<?php
function keyMultiSort(&$array,
$key,
$reverse = false,
$priority_last = false,
$save_key = true,
Callable $func = null)
{
if ($func === null)
{
$func = function ($first, $second) use ($key, $reverse, $priority_last)
{
if (!isset($first[$key]))
{
return ($reverse === false) ? -1 : 1;
}
if (!isset($second[$key]))
{
return ($reverse === false) ? 1 : -1;
}
if ($first[$key] > $second[$key])
{
return ($reverse === false) ? 1 : -1;
}
if ($first[$key] < $second[$key])
{
return ($reverse === false) ? -1 : 1;
}
if ($first[$key] === $second[$key])
{
return ($priority_last === false) ? 1 : -1;
}
return 0;
};
}
if ($save_key)
{
uasort($array, $func);
}
else
{
usort($array, $func);
}
}
$array = [
['foo', 'bar', 'test'],
['boo', 'bar'],
['goo', 'bar'],
['hoo', 'doo', 'test', 'test2'],
['foo', 'manchu'],
['moo', 'bar'],
['too', 'bar'],
['foo', 'fighters'],
['blue kazoo', 'bar', 'test'],
];
$pairs = [];
$str = '';
foreach($array as $item)
{
foreach($item as $key=>$elm)
{
foreach($item as $key2=>$elm2)
{
if($key !== $key2)
{
if(!isset($pairs[$elm]['count']))
{
$pairs[$elm]['count'] = 1;
}
else
{
$pairs[$elm]['count']++;
}
$pairs[$elm]['elements'][] = $elm2;
}
}
}
keyMultiSort($pairs, 'count', true);
}
//var_dump($pairs);
$remove = [];
foreach($pairs as $elm=>$item)
{
$remove[] = $elm;
$elements = array_diff($item['elements'], $remove);
if(empty($elements))
{
if (in_array($elm, $remove))
{
continue;
}
$str .= $elm.PHP_EOL;
}
else
{
$str .= $elm.' AND ('.implode(' OR ', array_unique($elements)).')'.PHP_EOL;
}
}
var_dump($str);
响应:
string(184) "bar AND (foo OR test OR boo OR goo OR moo OR too OR blue kazoo)
test AND (foo OR hoo OR doo OR test2 OR blue kazoo)
foo AND (manchu OR fighters)
hoo AND (doo OR test2)
doo AND (test2)
"
P.S.希望我已经正确理解了任务...
更新 添加了不忽略 "single values" 的代码。 我改变了逻辑:
...
['"yellow balloon"', 'foo', 'bar', 'baz', 'qut'],
...
return:
...
qut AND ("yellow balloon" OR baz)
baz AND ("yellow balloon")
...
在我看来,对于这个任务,这是正确的(条件是组合超过 2 个值)。
function keyMultiSort(&$array,
$key,
$reverse = false,
$priority_last = false,
$save_key = true,
Callable $func = null)
{
if ($func === null)
{
$func = function ($first, $second) use ($key, $reverse, $priority_last)
{
if (!isset($first[$key]))
{
return ($reverse === false) ? -1 : 1;
}
if (!isset($second[$key]))
{
return ($reverse === false) ? 1 : -1;
}
if ($first[$key] > $second[$key])
{
return ($reverse === false) ? 1 : -1;
}
if ($first[$key] < $second[$key])
{
return ($reverse === false) ? -1 : 1;
}
if ($first[$key] === $second[$key])
{
return ($priority_last === false) ? 1 : -1;
}
return 0;
};
}
if ($save_key)
{
uasort($array, $func);
}
else
{
usort($array, $func);
}
}
$array = [
['foo', 'bar', 'test'],
['boo', 'bar'],
['goo', 'bar'],
['hoo', 'doo', 'test', 'test2'],
['foo', 'manchu'],
['moo', 'bar'],
['too', 'bar'],
['foo', 'fighters'],
['"blue kazoo"', 'bar', 'test'],
['"red panda"', 'bar', 'test'],
['"yellow balloon"', 'foo', 'bar', 'baz', 'qut'],
['"red panda"', 'fighters', 'moo'],
['"foo fighters"'],
['foo'],
['bar'],
];
$pairs = [];
$singles = [];
$str = '';
foreach ($array as $item)
{
foreach ($item as $key => $elm)
{
if(count($item) === 1)
{
$singles[$elm] = 1;
}
else
{
if (!isset($pairs[$elm]))
{
$pairs[$elm]['count'] = 0;
$pairs[$elm]['elements'] = [];
}
foreach ($item as $key2 => $elm2)
{
if ($key !== $key2)
{
$pairs[$elm]['count']++;
$pairs[$elm]['elements'][] = $elm2;
}
}
}
}
keyMultiSort($pairs, 'count', true);
}
//var_dump($pairs);exit;
$remove = [];
foreach ($pairs as $elm => $item)
{
$remove[] = $elm;
$elements = array_diff($item['elements'], $remove);
$elements = array_unique($elements);
if (!empty($elements)){
$str .= $elm.' AND ('.implode(' OR ', $elements).')'.PHP_EOL;
}
}
foreach ($singles as $elm => $item)
{
$str .= $elm.PHP_EOL;
}
var_dump($str);
回复:
string(421) "bar AND (foo OR test OR boo OR goo OR moo OR too OR "blue kazoo" OR "red panda" OR "yellow balloon" OR baz OR qut)
test AND (foo OR hoo OR doo OR test2 OR "blue kazoo" OR "red panda")
foo AND (manchu OR fighters OR "yellow balloon" OR baz OR qut)
"red panda" AND (fighters OR moo)
qut AND ("yellow balloon" OR baz)
baz AND ("yellow balloon")
test2 AND (hoo OR doo)
fighters AND (moo)
doo AND (hoo)
"foo fighters"
foo
bar
"
P.S。在我看来,这个问题并不适用于现实