Ruby - 根据散列键的子集显示 2 个散列数组之间的增量
Ruby - Show Deltas Between 2 array of hashes based on subset of hash keys
我正在尝试比较具有非常相似的散列结构的两个散列数组 (相同且始终存在的键) 和 return 两者之间的增量 - - 具体来说,我想捕获以下内容:
- 散列
array1
中不存在于 array2
中的部分
- 散列
array2
中 array1
中不存在的部分
- 出现在两个数据集中的哈希值
这通常可以通过简单地执行以下操作来实现:
deltas_old_new = (array1-array2)
deltas_new_old = (array2-array1)
我的问题(已经变成了 2-3 小时的斗争!)是我需要根据散列中 3 个键的值来识别增量 ('id', 'ref', 'name')——这 3 个键的值有效地构成了我数据中的唯一条目——但我必须保留其他 key/value 对散列 (例如 'extra'
和许多其他 key/value 对,为简洁起见未显示。
示例数据:
array1 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
array2 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
{'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
预期结果(3 个独立的哈希数组):
包含 array1
但不在 array2
中的数据的对象 --
[{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
对象包含数据在 array2
但不在 array1
--
[{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
{'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
包含 array1
和 array2
数据的对象 --
[{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'}]
我尝试了很多次尝试来比较迭代数组和使用基于 3 个键的 Hash#keep_if
以及将两个数据集合并到一个数组中,然后尝试基于 array1
但我一直空手而归。预先感谢您的时间和帮助!
array1 - array2 #data in array1 but not in array2
array2 - array1 #data in array2 but not in array1
array1 & array2 #data in both array1 and array2
既然你已经标记了这个问题 set 你可以类似地使用集合:
require 'set'
set1 = array1.to_set
set2 = array2.to_set
set1 - set2
set2 - set1
set1 & set2
这不是很漂亮,但是很管用。它创建第三个数组,其中包含 array1
和 array2
中的所有唯一值并遍历它。
然后,由于 include?
不允许自定义匹配器,我们可以通过使用 detect
并在数组中查找具有自定义子哈希匹配的项目来伪造它。我们将把它包装在一个自定义方法中,这样我们就可以调用它传入 array1
或 array2
而不是写两次。
最后,我们遍历我们的 array3
并确定 item
是否来自 array1
、array2
或两者,并添加到相应的输出数组.
array1 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
array2 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
{'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
# combine the arrays into 1 array that contains items in both array1 and array2 to loop through
array3 = (array1 + array2).uniq { |item| { 'id' => item['id'], 'ref' => item['ref'], 'name' => item['name'] } }
# Array#include? doesn't allow a custom matcher, so we can fake it by using Array#detect
def is_included_in(array, object)
object_identifier = { 'id' => object['id'], 'ref' => object['ref'], 'name' => object['name'] }
array.detect do |item|
{ 'id' => item['id'], 'ref' => item['ref'], 'name' => item['name'] } == object_identifier
end
end
# output array initializing
array1_only = []
array2_only = []
array1_and_array2 = []
# loop through all items in both array1 and array2 and check if it was in array1 or array2
# if it was in both, add to array1_and_array2, otherwise, add it to the output array that
# corresponds to the input array
array3.each do |item|
in_array1 = is_included_in(array1, item)
in_array2 = is_included_in(array2, item)
if in_array1 && in_array2
array1_and_array2.push item
elsif in_array1
array1_only.push item
else
array2_only.push item
end
end
puts array1_only.inspect # => [{"id"=>"2", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"}, {"id"=>"7", "ref"=>"1007", "name"=>"OR", "extra"=>"Not Sorted On 11"}]
puts array2_only.inspect # => [{"id"=>"8", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"}, {"id"=>"5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 10"}, {"id"=>"12", "ref"=>"1012", "name"=>"TX", "extra"=>"Not Sorted On 85"}]
puts array1_and_array2.inspect # => [{"id"=>"1", "ref"=>"1001", "name"=>"CA", "extra"=>"Not Sorted On 5"}, {"id"=>"3", "ref"=>"1003", "name"=>"WA", "extra"=>"Not Sorted On 9"}]
对于此类问题,通常最容易使用索引。
代码
def keepers(array1, array2, keys)
a1 = make_hash(array1, keys)
a2 = make_hash(array2, keys)
common_keys_of_a1_and_a2 = a1.keys & a2.keys
[keeper_idx(array1, a1, common_keys_of_a1_and_a2),
keeper_idx(array2, a2, common_keys_of_a1_and_a2)]
end
def make_hash(arr, keys)
arr.each_with_index.with_object({}) do |(g,i),h|
(h[g.values_at(*keys)] ||= []) << i
end
end
def keeper_idx(arr, a, common_keys_of_a1_and_a2)
arr.size.times.to_a - a.values_at(*common_keys_of_a1_and_a2).flatten
end
例子
array1 =
[{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 8'},
{'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
array2 =
[{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 12'},
{'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
请注意,这两个数组与问题中给出的数组略有不同。这个问题没有具体说明每个数组是否可以包含两个哈希值,这两个哈希值对于指定的键具有相同的值。因此,我向每个数组添加了一个散列,以显示是否处理了该案例。
keys = ['id', 'ref', 'name']
idx1, idx2 = keepers(array1, array2, keys)
#=> [[1, 4], [2, 3, 4, 5]]
idx1
(idx2
) 是删除匹配后保留的 array1
(array2
) 元素的索引。 (但是,array1
和 array2
未被修改。)
由此可见两个数组映射到
array1.values_at(*idx1)
#=> [{"id"=> "2", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"},
# {"id"=> "7", "ref"=>"1007", "name"=>"OR", "extra"=>"Not Sorted On 11"}]
和
array2.values_at(*idx2)
#=> [{"id"=> "8", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"},
# {"id"=> "5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 10"},
# {"id"=> "5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 12"},
# {"id"=>"12", "ref"=>"1012", "name"=>"TX", "extra"=>"Not Sorted On 85"}]
被移除的哈希索引如下。
array1.size.times.to_a - idx1
#=> [0, 2, 3]
array2.size.times.to_a - idx2
#[0, 1]
说明
步骤如下
a1 = make_hash(array1, keys)
#=> {["1", "1001", "CA"]=>[0], ["2", "1002", "NY"]=>[1],
# ["3", "1003", "WA"]=>[2, 3], ["7", "1007", "OR"]=>[4]}
a2 = make_hash(array2, keys)
#=> {["1", "1001", "CA"]=>[0], ["3", "1003", "WA"]=>[1],
# ["8", "1002", "NY"]=>[2], ["5", "1005", "MT"]=>[3, 4],
# ["12", "1012", "TX"]=>[5]}
common_keys_of_a1_and_a2 = a1.keys & a2.keys
#=> [["1", "1001", "CA"], ["3", "1003", "WA"]]
keeper_idx(array1, a1, common_keys_of_a1_and_a2)
#=> [1, 4] (for array1)
keeper_idx(array2, a2, common_keys_of_a1_and_a2)
#=> [2, 3, 4, 5]· (for array2)
我正在尝试比较具有非常相似的散列结构的两个散列数组 (相同且始终存在的键) 和 return 两者之间的增量 - - 具体来说,我想捕获以下内容:
- 散列
array1
中不存在于array2
中的部分
- 散列
array2
中array1
中不存在的部分
- 出现在两个数据集中的哈希值
这通常可以通过简单地执行以下操作来实现:
deltas_old_new = (array1-array2)
deltas_new_old = (array2-array1)
我的问题(已经变成了 2-3 小时的斗争!)是我需要根据散列中 3 个键的值来识别增量 ('id', 'ref', 'name')——这 3 个键的值有效地构成了我数据中的唯一条目——但我必须保留其他 key/value 对散列 (例如 'extra'
和许多其他 key/value 对,为简洁起见未显示。
示例数据:
array1 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
array2 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
{'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
预期结果(3 个独立的哈希数组):
包含 array1
但不在 array2
中的数据的对象 --
[{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
对象包含数据在 array2
但不在 array1
--
[{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
{'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
包含 array1
和 array2
数据的对象 --
[{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'}]
我尝试了很多次尝试来比较迭代数组和使用基于 3 个键的 Hash#keep_if
以及将两个数据集合并到一个数组中,然后尝试基于 array1
但我一直空手而归。预先感谢您的时间和帮助!
array1 - array2 #data in array1 but not in array2
array2 - array1 #data in array2 but not in array1
array1 & array2 #data in both array1 and array2
既然你已经标记了这个问题 set 你可以类似地使用集合:
require 'set'
set1 = array1.to_set
set2 = array2.to_set
set1 - set2
set2 - set1
set1 & set2
这不是很漂亮,但是很管用。它创建第三个数组,其中包含 array1
和 array2
中的所有唯一值并遍历它。
然后,由于 include?
不允许自定义匹配器,我们可以通过使用 detect
并在数组中查找具有自定义子哈希匹配的项目来伪造它。我们将把它包装在一个自定义方法中,这样我们就可以调用它传入 array1
或 array2
而不是写两次。
最后,我们遍历我们的 array3
并确定 item
是否来自 array1
、array2
或两者,并添加到相应的输出数组.
array1 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
array2 = [{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
{'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
# combine the arrays into 1 array that contains items in both array1 and array2 to loop through
array3 = (array1 + array2).uniq { |item| { 'id' => item['id'], 'ref' => item['ref'], 'name' => item['name'] } }
# Array#include? doesn't allow a custom matcher, so we can fake it by using Array#detect
def is_included_in(array, object)
object_identifier = { 'id' => object['id'], 'ref' => object['ref'], 'name' => object['name'] }
array.detect do |item|
{ 'id' => item['id'], 'ref' => item['ref'], 'name' => item['name'] } == object_identifier
end
end
# output array initializing
array1_only = []
array2_only = []
array1_and_array2 = []
# loop through all items in both array1 and array2 and check if it was in array1 or array2
# if it was in both, add to array1_and_array2, otherwise, add it to the output array that
# corresponds to the input array
array3.each do |item|
in_array1 = is_included_in(array1, item)
in_array2 = is_included_in(array2, item)
if in_array1 && in_array2
array1_and_array2.push item
elsif in_array1
array1_only.push item
else
array2_only.push item
end
end
puts array1_only.inspect # => [{"id"=>"2", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"}, {"id"=>"7", "ref"=>"1007", "name"=>"OR", "extra"=>"Not Sorted On 11"}]
puts array2_only.inspect # => [{"id"=>"8", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"}, {"id"=>"5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 10"}, {"id"=>"12", "ref"=>"1012", "name"=>"TX", "extra"=>"Not Sorted On 85"}]
puts array1_and_array2.inspect # => [{"id"=>"1", "ref"=>"1001", "name"=>"CA", "extra"=>"Not Sorted On 5"}, {"id"=>"3", "ref"=>"1003", "name"=>"WA", "extra"=>"Not Sorted On 9"}]
对于此类问题,通常最容易使用索引。
代码
def keepers(array1, array2, keys)
a1 = make_hash(array1, keys)
a2 = make_hash(array2, keys)
common_keys_of_a1_and_a2 = a1.keys & a2.keys
[keeper_idx(array1, a1, common_keys_of_a1_and_a2),
keeper_idx(array2, a2, common_keys_of_a1_and_a2)]
end
def make_hash(arr, keys)
arr.each_with_index.with_object({}) do |(g,i),h|
(h[g.values_at(*keys)] ||= []) << i
end
end
def keeper_idx(arr, a, common_keys_of_a1_and_a2)
arr.size.times.to_a - a.values_at(*common_keys_of_a1_and_a2).flatten
end
例子
array1 =
[{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '2', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 8'},
{'id' => '7', 'ref' => '1007', 'name' => 'OR', 'extra' => 'Not Sorted On 11'}]
array2 =
[{'id' => '1', 'ref' => '1001', 'name' => 'CA', 'extra' => 'Not Sorted On 5'},
{'id' => '3', 'ref' => '1003', 'name' => 'WA', 'extra' => 'Not Sorted On 9'},
{'id' => '8', 'ref' => '1002', 'name' => 'NY', 'extra' => 'Not Sorted On 7'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 10'},
{'id' => '5', 'ref' => '1005', 'name' => 'MT', 'extra' => 'Not Sorted On 12'},
{'id' => '12', 'ref' => '1012', 'name' => 'TX', 'extra' => 'Not Sorted On 85'}]
请注意,这两个数组与问题中给出的数组略有不同。这个问题没有具体说明每个数组是否可以包含两个哈希值,这两个哈希值对于指定的键具有相同的值。因此,我向每个数组添加了一个散列,以显示是否处理了该案例。
keys = ['id', 'ref', 'name']
idx1, idx2 = keepers(array1, array2, keys)
#=> [[1, 4], [2, 3, 4, 5]]
idx1
(idx2
) 是删除匹配后保留的 array1
(array2
) 元素的索引。 (但是,array1
和 array2
未被修改。)
由此可见两个数组映射到
array1.values_at(*idx1)
#=> [{"id"=> "2", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"},
# {"id"=> "7", "ref"=>"1007", "name"=>"OR", "extra"=>"Not Sorted On 11"}]
和
array2.values_at(*idx2)
#=> [{"id"=> "8", "ref"=>"1002", "name"=>"NY", "extra"=>"Not Sorted On 7"},
# {"id"=> "5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 10"},
# {"id"=> "5", "ref"=>"1005", "name"=>"MT", "extra"=>"Not Sorted On 12"},
# {"id"=>"12", "ref"=>"1012", "name"=>"TX", "extra"=>"Not Sorted On 85"}]
被移除的哈希索引如下。
array1.size.times.to_a - idx1
#=> [0, 2, 3]
array2.size.times.to_a - idx2
#[0, 1]
说明
步骤如下
a1 = make_hash(array1, keys)
#=> {["1", "1001", "CA"]=>[0], ["2", "1002", "NY"]=>[1],
# ["3", "1003", "WA"]=>[2, 3], ["7", "1007", "OR"]=>[4]}
a2 = make_hash(array2, keys)
#=> {["1", "1001", "CA"]=>[0], ["3", "1003", "WA"]=>[1],
# ["8", "1002", "NY"]=>[2], ["5", "1005", "MT"]=>[3, 4],
# ["12", "1012", "TX"]=>[5]}
common_keys_of_a1_and_a2 = a1.keys & a2.keys
#=> [["1", "1001", "CA"], ["3", "1003", "WA"]]
keeper_idx(array1, a1, common_keys_of_a1_and_a2)
#=> [1, 4] (for array1)
keeper_idx(array2, a2, common_keys_of_a1_and_a2)
#=> [2, 3, 4, 5]· (for array2)