Ruby - 根据散列键的子集显示 2 个散列数组之间的增量

Ruby - Show Deltas Between 2 array of hashes based on subset of hash keys

我正在尝试比较具有非常相似的散列结构的两个散列数组 (相同且始终存在的键) 和 return 两者之间的增量 - - 具体来说,我想捕获以下内容:

这通常可以通过简单地执行以下操作来实现:

deltas_old_new = (array1-array2)
deltas_new_old = (array2-array1)

我的问题(已经变成了 2-3 小时的斗争!)是我需要根据散列中 3 个键的值来识别增量 ('id', 'ref', 'name')——这 3 个键的值有效地构成了我数据中的唯一条目——但我必须保留其他 key/value 对散列 (例如 'extra' 和许多其他 key/value 对,为简洁起见未显示。

示例数据:

array1 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
          {'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
          {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
          {'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]

array2 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
          {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
          {'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
          {'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
          {'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]

预期结果(3 个独立的哈希数组):

包含 array1 但不在 array2 中的数据的对象 --

[{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
 {'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]

对象包含数据在 array2 但不在 array1 --

[{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
 {'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
 {'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]

包含 array1array2 数据的对象 --

[{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
 {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'}]

我尝试了很多次尝试来比较迭代数组和使用基于 3 个键的 Hash#keep_if 以及将两个数据集合并到一个数组中,然后尝试基于 array1 但我一直空手而归。预先感谢您的时间和帮助!

Array#- and Array#&

array1 - array2   #data in array1 but not in array2
array2 - array1   #data in array2 but not in array1
array1 & array2   #data in both array1 and array2

既然你已经标记了这个问题 你可以类似地使用集合:

require 'set'

set1 = array1.to_set
set2 = array2.to_set

set1 - set2
set2 - set1
set1 & set2

这不是很漂亮,但是很管用。它创建第三个数组,其中包含 array1array2 中的所有唯一值并遍历它。

然后,由于 include? 不允许自定义匹配器,我们可以通过使用 detect 并在数组中查找具有自定义子哈希匹配的项目来伪造它。我们将把它包装在一个自定义方法中,这样我们就可以调用它传入 array1array2 而不是写两次。

最后,我们遍历我们的 array3 并确定 item 是否来自 array1array2 或两者,并添加到相应的输出数组.

array1 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
          {'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
          {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
          {'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]

array2 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
          {'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
          {'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
          {'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
          {'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]

# combine the arrays into 1 array that contains items in both array1 and array2 to loop through
array3 = (array1 + array2).uniq { |item| { 'id' => item['id'], 'ref' => item['ref'], 'name' => item['name'] } }

# Array#include? doesn't allow a custom matcher, so we can fake it by using Array#detect
def is_included_in(array, object)
  object_identifier = { 'id' => object['id'], 'ref' => object['ref'], 'name' => object['name'] }

  array.detect do |item|
    { 'id' => item['id'], 'ref' => item['ref'], 'name' => item['name'] } == object_identifier
  end
end

# output array initializing
array1_only = []
array2_only = []
array1_and_array2 = []

# loop through all items in both array1 and array2 and check if it was in array1 or array2
# if it was in both, add to array1_and_array2, otherwise, add it to the output array that
# corresponds to the input array
array3.each do |item|
  in_array1 = is_included_in(array1, item)
  in_array2 = is_included_in(array2, item)

  if in_array1 && in_array2
    array1_and_array2.push item
  elsif in_array1
    array1_only.push item
  else
    array2_only.push item
  end
end


puts array1_only.inspect        # => [{"id"=>"2", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"}, {"id"=>"7", "ref"=>"1007", "name"=>"OR", "extra"=>"Not Sorted On 11"}]
puts array2_only.inspect        # => [{"id"=>"8", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"}, {"id"=>"5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 10"}, {"id"=>"12", "ref"=>"1012", "name"=>"TX", "extra"=>"Not Sorted On 85"}]
puts array1_and_array2.inspect  # => [{"id"=>"1", "ref"=>"1001", "name"=>"CA", "extra"=>"Not Sorted On 5"}, {"id"=>"3", "ref"=>"1003", "name"=>"WA", "extra"=>"Not Sorted On 9"}]

对于此类问题,通常最容易使用索引。

代码

def keepers(array1, array2, keys)
  a1 = make_hash(array1, keys)
  a2 = make_hash(array2, keys)
  common_keys_of_a1_and_a2 = a1.keys & a2.keys
  [keeper_idx(array1, a1, common_keys_of_a1_and_a2),
   keeper_idx(array2, a2, common_keys_of_a1_and_a2)]
end

def make_hash(arr, keys)
  arr.each_with_index.with_object({}) do |(g,i),h|
    (h[g.values_at(*keys)] ||= []) << i
  end
end

def keeper_idx(arr, a, common_keys_of_a1_and_a2)
  arr.size.times.to_a - a.values_at(*common_keys_of_a1_and_a2).flatten
end

例子

array1 =
  [{'id' =>  '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
   {'id' =>  '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
   {'id' =>  '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
   {'id' =>  '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 8'},
   {'id' =>  '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]

array2 =
  [{'id' =>  '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
   {'id' =>  '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
   {'id' =>  '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
   {'id' =>  '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
   {'id' =>  '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 12'},
   {'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]

请注意,这两个数组与问题中给出的数组略有不同。这个问题没有具体说明每个数组是否可以包含两个哈希值,这两个哈希值对于指定的键具有相同的值。因此,我向每个数组添加了一个散列,以显示是否处理了该案例。

keys = ['id', 'ref', 'name']

idx1, idx2 = keepers(array1, array2, keys)
  #=> [[1, 4], [2, 3, 4, 5]]

idx1 (idx2) 是删除匹配后保留的 array1 (array2) 元素的索引。 (但是,array1array2 未被修改。)

由此可见两个数组映射到

array1.values_at(*idx1)
  #=> [{"id"=> "2", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"},
  #    {"id"=> "7", "ref"=>"1007", "name"=>"OR", "extra"=>"Not Sorted On 11"}]

array2.values_at(*idx2)
  #=> [{"id"=> "8", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"},
  #    {"id"=> "5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 10"},
  #    {"id"=> "5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 12"},
  #    {"id"=>"12", "ref"=>"1012", "name"=>"TX", "extra"=>"Not Sorted On 85"}]

被移除的哈希索引如下。

array1.size.times.to_a - idx1
  #=> [0, 2, 3]
array2.size.times.to_a - idx2
  #[0, 1]

说明

步骤如下

a1 = make_hash(array1, keys)
  #=> {["1", "1001", "CA"]=>[0], ["2", "1002", "NY"]=>[1],
  #    ["3", "1003", "WA"]=>[2, 3], ["7", "1007", "OR"]=>[4]}    
a2 = make_hash(array2, keys)
  #=> {["1", "1001", "CA"]=>[0], ["3", "1003", "WA"]=>[1],
  #    ["8", "1002", "NY"]=>[2], ["5", "1005", "MT"]=>[3, 4],
  #    ["12", "1012", "TX"]=>[5]}
common_keys_of_a1_and_a2 = a1.keys & a2.keys
  #=> [["1", "1001", "CA"], ["3", "1003", "WA"]]
keeper_idx(array1, a1, common_keys_of_a1_and_a2)
  #=> [1, 4] (for array1)
keeper_idx(array2, a2, common_keys_of_a1_and_a2)
  #=> [2, 3, 4, 5]· (for array2)