如何转置 ruby 中的哈希数组
How can I transpose array of hashes in ruby
我有以下哈希数组作为输入:-
input =[
{"ID"=>"100", "Key"=>"Field A", "Value"=>"123"},
{"ID"=>"100", "Key"=>"Field B", "Value"=>"333"},
{"ID"=>"100", "Key"=>"Field C", "Value"=>"555"},
{"ID"=>"200", "Key"=>"Field A", "Value"=>"789"},
{"ID"=>"200", "Key"=>"Field B", "Value"=>"999"},
{"ID"=>"200", "Key"=>"Field D", "Value"=>"444"}
]
我想按如下方式转换这个哈希数组
output =[
{"ID"=>"100", "Field A"=>"123", "Field B"=>"333", "Field C" => "555", "Field D" => ""},
{"ID"=>"200", "Field A"=>"789", "Field B"=>"999", "Field C" => "", "Field D" => "444"}
]
我可以如下获取唯一 ID 和密钥
irb(main):099:0> unique_id = input.map { |p| p["ID"] }.uniq
=> ["100", "200"]
irb(main):100:0> unique_keys = input.map { |p| p["Key"] }.uniq
=> ["Field A", "Field B", "Field C", "Field D"]
但是,我无法继续为每个 ID 创建唯一的哈希数组,其中包含输入哈希中定义的 keys/value 对。
这样的事情可能会完成这项工作:
keys = input.map { |hash| hash['Key'] }.uniq
result = Hash.new { |result, id| result[id] = {} }
input.each { |hash| result[hash['ID']].merge!(hash['Key'] => hash['Value']) }
result.default = nil # optional: remove the default value
result.each do |id, hash|
(keys - hash.keys).each { |key| hash[key] = '' }
hash['ID'] = id
end
result.values
#=> [{"Field A"=>"123", "Field B"=>"333", "Field C"=>"555", "Field D"=>"", "ID"=>"100"},
# {"Field A"=>"789", "Field B"=>"999", "Field D"=>"444", "Field C"=>"", "ID"=>"200"}]
如果您确定某些值永远不会虚假,您可以替换:
(keys - hash.keys).each { |key| hash[key] = '' }
# with
keys.each { |key| hash[key] ||= '' }
我首先创建一个散列 result
来保存生成的散列,我将值设置为默认值到一个新的散列。然后我根据 ID 得到正确的散列并将键值对合并到散列中。最后,我将缺少的键添加到散列中并将它们的值设置为空字符串,并将散列保存在其下的 ID 添加到散列中。
note: If your input
array contains duplicate key-value pairs, the last one will be used. For example, say both {"ID"=>"100", "Key"=>"Field A", "Value"=>"123"}
and {"ID"=>"100", "Key"=>"Field A", "Value"=>"456"}
are present. Then "Field A" => "456"
will be set, since it's the latter of the two.
尝试关注,
fields = input.map {|x| x['Key'] }.uniq
output = input.group_by { |x| x['ID'] }
.map { |k,v| ([['ID', k]] + v.map {|z| z.values_at('Key','Value') }).to_h }
output.map! { |x| {'ID' => x['ID']}.merge fields.to_h {|z| [z, x[z].to_s]} }
输出将是,
[
{"ID"=>"100", "Field A"=>"123", "Field B"=>"333", "Field C"=>"555", "Field D"=>""},
{"ID"=>"200", "Field A"=>"789", "Field B"=>"999", "Field C"=>"", "Field D"=>"444"}
]
keys = input.map { |hash| hash['Key'] }.uniq
output = input.group_by { |x| x['ID'] }.map { |k,v| ([['ID', k]] + v.map {|z| z.values_at('Key','Value') }).to_h }
output.map! { |x| {'ID' => x['ID']}.merge fields.map {|z| [z, x[z].to_s]}.to_h }
下面给我的输出如下图
[
{"ID"=>"100", "Field A"=>"123", "Field B"=>"333", "Field C"=>"555", "Field D"=>""},
{"ID"=>"200", "Field A"=>"789", "Field B"=>"999", "Field C"=>"", "Field D"=>"444"}
]
感谢大家的参与
我的回答分为三个步骤。
第一步:获取"ID"
的唯一值和"Field X"
形式的唯一键
ids, keys = input.map { |h| h.values_at("ID", "Key") }.transpose.map(&:uniq)
#=> [["100", "200"], ["Field A", "Field B", "Field C", "Field D"]]
参见 Hash#values_at。计算如下:
a = input.map { |h| h.values_at("ID", "Key") }
#=> [["100", "Field A"], ["100", "Field B"], ["100", "Field C"],
# ["200", "Field A"], ["200", "Field B"], ["200", "Field D"]]
b = a.transpose
#=> [["100", "100", "100", "200", "200", "200"],
# ["Field A", "Field B", "Field C", "Field A", "Field B", "Field D"]]
ids, keys = b.map(&:uniq)
#=> [["100", "200"], ["Field A", "Field B", "Field C", "Field D"]]
ids
#=> ["100", "200"]
keys
#=> ["Field A", "Field B", "Field C", "Field D"]
第2步:构造一个哈希,其键为"ID"
的唯一值,其值为第3步中待完成和提取的哈希
h = ids.each_with_object({}) { |id,h|
h[id] = keys.each_with_object("ID"=>id) { |key,g| g[key] = "" } }
#=> {"100"=>{"ID"=>"100", "Field A"=>"", "Field B"=>"", "Field C"=>"",
# "Field D"=>""},
# "200"=>{"ID"=>"200", "Field A"=>"", "Field B"=>"", "Field C"=>"",
# "Field D"=>""}}
第 3 步:循环 input
以完成在第 2 步中构建的散列值,然后作为最后一步,从该散列中提取值
input.each_with_object(h) { |g,h| h[g["ID"]].update(g["Key"]=>g["Value"]) }.values
#=> [{"ID"=>"100", "Field A"=>"123", "Field B"=>"333", "Field C"=>"555",
# "Field D"=>""},
# {"ID"=>"200", "Field A"=>"789", "Field B"=>"999", "Field C"=>"",
# "Field D"=>"444"}]
参见Hash#update (aka merge!
) and Hash#values。两次计算如下:
h = input.each_with_object(h) { |g,h| h[g["ID"]].update(g["Key"]=>g["Value"]) }
#=> {"100"=>{"ID"=>"100", "Field A"=>"123", "Field B"=>"333","Field C"=>"555",
# "Field D"=>""},
# "200"=>{"ID"=>"200", "Field A"=>"789", "Field B"=>"999","Field C"=>"",
# "Field D"=>"444"}}
h.values
#=> <as above>
输出结构不是我想要使用的,输入结构似乎正在影响所需的输出。结果是 XY problem.
散列非常有效,尤其是当您拥有类似于数据库中的索引字段的内容时。与散列相比,遍历数组以查找值的效率极低,因此我建议再看一下这两种结构。
将输入转换为真正的散列并不难:
input = [
{"ID"=>"100", "Key"=>"Field A", "Value"=>"123"},
{"ID"=>"100", "Key"=>"Field B", "Value"=>"333"},
{"ID"=>"100", "Key"=>"Field C", "Value"=>"555"},
{"ID"=>"200", "Key"=>"Field A", "Value"=>"789"},
{"ID"=>"200", "Key"=>"Field B", "Value"=>"999"},
{"ID"=>"200", "Key"=>"Field D", "Value"=>"444"}
]
output = Hash.new { |h, k| h[k] = {} } # => {}
input.each { |e|
id = e['ID']
key = e['Key']
value = e['Value']
output[id][key] = value
}
这导致:
output
# => {"100"=>{"Field A"=>"123", "Field B"=>"333", "Field C"=>"555"},
# "200"=>{"Field A"=>"789", "Field B"=>"999", "Field D"=>"444"}}
这样做的好处非常明显,如果你想要 "200"
的数据,很容易获取:
output['200'] # => {"Field A"=>"789", "Field B"=>"999", "Field D"=>"444"}
output['200']['Field B'] # => "999"
我有以下哈希数组作为输入:-
input =[
{"ID"=>"100", "Key"=>"Field A", "Value"=>"123"},
{"ID"=>"100", "Key"=>"Field B", "Value"=>"333"},
{"ID"=>"100", "Key"=>"Field C", "Value"=>"555"},
{"ID"=>"200", "Key"=>"Field A", "Value"=>"789"},
{"ID"=>"200", "Key"=>"Field B", "Value"=>"999"},
{"ID"=>"200", "Key"=>"Field D", "Value"=>"444"}
]
我想按如下方式转换这个哈希数组
output =[
{"ID"=>"100", "Field A"=>"123", "Field B"=>"333", "Field C" => "555", "Field D" => ""},
{"ID"=>"200", "Field A"=>"789", "Field B"=>"999", "Field C" => "", "Field D" => "444"}
]
我可以如下获取唯一 ID 和密钥
irb(main):099:0> unique_id = input.map { |p| p["ID"] }.uniq
=> ["100", "200"]
irb(main):100:0> unique_keys = input.map { |p| p["Key"] }.uniq
=> ["Field A", "Field B", "Field C", "Field D"]
但是,我无法继续为每个 ID 创建唯一的哈希数组,其中包含输入哈希中定义的 keys/value 对。
这样的事情可能会完成这项工作:
keys = input.map { |hash| hash['Key'] }.uniq
result = Hash.new { |result, id| result[id] = {} }
input.each { |hash| result[hash['ID']].merge!(hash['Key'] => hash['Value']) }
result.default = nil # optional: remove the default value
result.each do |id, hash|
(keys - hash.keys).each { |key| hash[key] = '' }
hash['ID'] = id
end
result.values
#=> [{"Field A"=>"123", "Field B"=>"333", "Field C"=>"555", "Field D"=>"", "ID"=>"100"},
# {"Field A"=>"789", "Field B"=>"999", "Field D"=>"444", "Field C"=>"", "ID"=>"200"}]
如果您确定某些值永远不会虚假,您可以替换:
(keys - hash.keys).each { |key| hash[key] = '' }
# with
keys.each { |key| hash[key] ||= '' }
我首先创建一个散列 result
来保存生成的散列,我将值设置为默认值到一个新的散列。然后我根据 ID 得到正确的散列并将键值对合并到散列中。最后,我将缺少的键添加到散列中并将它们的值设置为空字符串,并将散列保存在其下的 ID 添加到散列中。
note: If your
input
array contains duplicate key-value pairs, the last one will be used. For example, say both{"ID"=>"100", "Key"=>"Field A", "Value"=>"123"}
and{"ID"=>"100", "Key"=>"Field A", "Value"=>"456"}
are present. Then"Field A" => "456"
will be set, since it's the latter of the two.
尝试关注,
fields = input.map {|x| x['Key'] }.uniq
output = input.group_by { |x| x['ID'] }
.map { |k,v| ([['ID', k]] + v.map {|z| z.values_at('Key','Value') }).to_h }
output.map! { |x| {'ID' => x['ID']}.merge fields.to_h {|z| [z, x[z].to_s]} }
输出将是,
[
{"ID"=>"100", "Field A"=>"123", "Field B"=>"333", "Field C"=>"555", "Field D"=>""},
{"ID"=>"200", "Field A"=>"789", "Field B"=>"999", "Field C"=>"", "Field D"=>"444"}
]
keys = input.map { |hash| hash['Key'] }.uniq
output = input.group_by { |x| x['ID'] }.map { |k,v| ([['ID', k]] + v.map {|z| z.values_at('Key','Value') }).to_h }
output.map! { |x| {'ID' => x['ID']}.merge fields.map {|z| [z, x[z].to_s]}.to_h }
下面给我的输出如下图
[
{"ID"=>"100", "Field A"=>"123", "Field B"=>"333", "Field C"=>"555", "Field D"=>""},
{"ID"=>"200", "Field A"=>"789", "Field B"=>"999", "Field C"=>"", "Field D"=>"444"}
]
感谢大家的参与
我的回答分为三个步骤。
第一步:获取"ID"
的唯一值和"Field X"
ids, keys = input.map { |h| h.values_at("ID", "Key") }.transpose.map(&:uniq)
#=> [["100", "200"], ["Field A", "Field B", "Field C", "Field D"]]
参见 Hash#values_at。计算如下:
a = input.map { |h| h.values_at("ID", "Key") }
#=> [["100", "Field A"], ["100", "Field B"], ["100", "Field C"],
# ["200", "Field A"], ["200", "Field B"], ["200", "Field D"]]
b = a.transpose
#=> [["100", "100", "100", "200", "200", "200"],
# ["Field A", "Field B", "Field C", "Field A", "Field B", "Field D"]]
ids, keys = b.map(&:uniq)
#=> [["100", "200"], ["Field A", "Field B", "Field C", "Field D"]]
ids
#=> ["100", "200"]
keys
#=> ["Field A", "Field B", "Field C", "Field D"]
第2步:构造一个哈希,其键为"ID"
的唯一值,其值为第3步中待完成和提取的哈希
h = ids.each_with_object({}) { |id,h|
h[id] = keys.each_with_object("ID"=>id) { |key,g| g[key] = "" } }
#=> {"100"=>{"ID"=>"100", "Field A"=>"", "Field B"=>"", "Field C"=>"",
# "Field D"=>""},
# "200"=>{"ID"=>"200", "Field A"=>"", "Field B"=>"", "Field C"=>"",
# "Field D"=>""}}
第 3 步:循环 input
以完成在第 2 步中构建的散列值,然后作为最后一步,从该散列中提取值
input.each_with_object(h) { |g,h| h[g["ID"]].update(g["Key"]=>g["Value"]) }.values
#=> [{"ID"=>"100", "Field A"=>"123", "Field B"=>"333", "Field C"=>"555",
# "Field D"=>""},
# {"ID"=>"200", "Field A"=>"789", "Field B"=>"999", "Field C"=>"",
# "Field D"=>"444"}]
参见Hash#update (aka merge!
) and Hash#values。两次计算如下:
h = input.each_with_object(h) { |g,h| h[g["ID"]].update(g["Key"]=>g["Value"]) }
#=> {"100"=>{"ID"=>"100", "Field A"=>"123", "Field B"=>"333","Field C"=>"555",
# "Field D"=>""},
# "200"=>{"ID"=>"200", "Field A"=>"789", "Field B"=>"999","Field C"=>"",
# "Field D"=>"444"}}
h.values
#=> <as above>
输出结构不是我想要使用的,输入结构似乎正在影响所需的输出。结果是 XY problem.
散列非常有效,尤其是当您拥有类似于数据库中的索引字段的内容时。与散列相比,遍历数组以查找值的效率极低,因此我建议再看一下这两种结构。
将输入转换为真正的散列并不难:
input = [
{"ID"=>"100", "Key"=>"Field A", "Value"=>"123"},
{"ID"=>"100", "Key"=>"Field B", "Value"=>"333"},
{"ID"=>"100", "Key"=>"Field C", "Value"=>"555"},
{"ID"=>"200", "Key"=>"Field A", "Value"=>"789"},
{"ID"=>"200", "Key"=>"Field B", "Value"=>"999"},
{"ID"=>"200", "Key"=>"Field D", "Value"=>"444"}
]
output = Hash.new { |h, k| h[k] = {} } # => {}
input.each { |e|
id = e['ID']
key = e['Key']
value = e['Value']
output[id][key] = value
}
这导致:
output
# => {"100"=>{"Field A"=>"123", "Field B"=>"333", "Field C"=>"555"},
# "200"=>{"Field A"=>"789", "Field B"=>"999", "Field D"=>"444"}}
这样做的好处非常明显,如果你想要 "200"
的数据,很容易获取:
output['200'] # => {"Field A"=>"789", "Field B"=>"999", "Field D"=>"444"}
output['200']['Field B'] # => "999"