如何根据 PHP 中的键清理具有半重复值的数组数组?

How to Clean Up Array of Arrays with Semi-duplicate Values Base on Keys in PHP?

假设我们正在做某种抓取,最后我们可以获得重复和半重复的结果。

给定一个可能看起来有点像这样的输入数组:

$inputArr = [
  [
    'title' => 'Test0',
    'desc'  => 'Short Desc',
  ],
  [
    'title' => 'Test5',
    'desc'  => 'Short Desc',
  ],
  [
    'title' => 'Test0',
    'desc'  => 'Much Longer Than Short Desc',
  ],
  [
    'title' => 'Test0.5',
    'desc'  => 'Short Desc',
  ],
  [
    'title' => 'Test1',
    'desc'  => 'Short Desc',
  ],
  [
    'title' => 'Test1',
    'desc'  => 'Much Longer Than Short Desc',
  ],
  [
    'title' => 'Test1.5',
    'desc'  => 'Much Longer Than Short Desc',
  ],
  [
    'title' => 'Test3',
    'desc'  => 'Short Desc',
  ],
  [
    'title' => 'Test2',
    'desc'  => 'Short Desc',
  ],
  [
    'title' => 'Test3.75',
    'desc'  => 'Much Longer Than Short Desc',
  ],
  [
    'title' => 'Test3.25',
    'desc'  => 'Short Desc',
  ],
  [
    'title' => 'Test2',
    'desc'  => 'Much Longer Than Short Desc',
  ],
  [
    'title' => 'Test3',
    'desc'  => 'Much Longer Than Short Desc',
  ],
  [
    'title' => 'Test5',
    'desc'  => 'Much Longer Than Short Desc',
  ],
  [
    'title' => 'Test3.5',
    'desc'  => 'Short Desc',
  ],
  [
    'title' => 'Test4',
    'desc'  => 'Short Desc',
  ],
  [
    'title' => 'Test5',
    'desc'  => 'Much Longer Than Short Desc',
  ],
  [
    'title' => 'Test4.5',
    'desc'  => 'Short Desc',
  ],
  [
    'title' => 'Test4',
    'desc'  => 'Much Longer Than Short Desc',
  ],
  [
    'title' => 'Test5',
    'desc'  => 'Much Longer Than Short Desc',
  ],
];

生成的数组必须仅包含具有 title 值的一个实例的数组,其中 desc 是最长的字符串值,同时删除除 desc 具有相等字符串长度的一个以外的所有数组对他人的价值。

例如最终输出应如下所示:

$resultArr = [
  [
    'title' => 'Test0',
    'desc'  => 'Much Longer Than Short Desc',
  ],
  [
    'title' => 'Test0.5',
    'desc'  => 'Short Desc',
  ],
  [
    'title' => 'Test1',
    'desc'  => 'Much Longer Than Short Desc',
  ],
  [
    'title' => 'Test1.5',
    'desc'  => 'Much Longer Than Short Desc',
  ],
  [
    'title' => 'Test2',
    'desc'  => 'Much Longer Than Short Desc',
  ],
  [
    'title' => 'Test3',
    'desc'  => 'Much Longer Than Short Desc',
  ],
  [
    'title' => 'Test3.25',
    'desc'  => 'Short Desc',
  ],
  [
    'title' => 'Test3.5',
    'desc'  => 'Short Desc',
  ],
  [
    'title' => 'Test3.75',
    'desc'  => 'Much Longer Than Short Desc',
  ],
  [
    'title' => 'Test4',
    'desc'  => 'Much Longer Than Short Desc',
  ],
  [
    'title' => 'Test4.5',
    'desc'  => 'Short Desc',
  ],
  [
    'title' => 'Test5',
    'desc'  => 'Much Longer Than Short Desc',
  ],
];

我尝试了几种不同的解决方案,但我都不喜欢其中任何一种。不管我是怎么想出来的,感觉就像是一团糟,我觉得我错过了一个明显而优雅的解决方案。

我知道有人会提出比我尝试过的排序、循环和过滤更干净的建议。

你可以这样做:

foreach($inputArr as $item) {

    if ( isset($result[$item['title']]) && strlen($result[$item['title']]['desc']) > strlen($item['desc']) )
        continue;    

    $result[$item['title']] = $item;
}

$result = array_values($result);

print_r($result);

demo

您使用标题作为键构建了一个新的关联数组。循环原始数组,当键存在时,检查 desc 的长度是否更长,否则继续,将结果数组中的项目替换为当前项目。

你也可以使用array_reduce:

$result = array_reduce($inputArr, function ($c, $i) {
    if ( !isset($c[$i['title']]) || strlen($c[$i['title']]['desc']) < strlen($i['desc']) )
        $c[$i['title']] = $i;

    return $c;
});


$result = array_values($result);

print_r($result);