如何使用Ruby和Nokogiri解析XML
How to use Ruby and Nokogiri to parse XML
本文档是防火墙配置的输出。我正在尝试构建防火墙规则的散列。我稍后会把这个数据输出到CSV/console/whatever 我需要:
<table index="44" title=" from PUBLIC to DMZ administrative service rules on Firewall01" ref="FILTER.BLACKLIST">
<headings>
<heading>Rule</heading>
<heading>Action</heading>
<heading>Source</heading>
<heading>Destination</heading>
<heading>Service</heading>
<heading>Log</heading>
</headings>
<tablebody>
<tablerow>
<tablecell><item>test_inbound</item></tablecell>
<tablecell><item>Allow</item></tablecell>
<tablecell><item gotoref="CONFIG.3.452">[Group] test_b2_group</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>Yes</item></tablecell>
</tablerow>
<tablerow>
<tablecell><item>host02_inbound</item></tablecell>
<tablecell><item>Allow</item></tablecell>
<tablecell><item gotoref="CONFIG.3.447">[Group] host02_group</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>Yes</item></tablecell>
</tablerow>
<tablerow>
<tablecell><item>randomhost</item></tablecell>
<tablecell><item>Allow</item></tablecell>
**<tablecell><item gotoref="CONFIG.3.383">[Group] Host_group_2</item><item gotoref="CONFIG.3.382">[Group] another_server</item></tablecell>**
<tablecell><item gotoref="CONFIG.3.510">[Group] crazy_application</item><item gotoref="CONFIG.3.511">[Group] internal_app</item><item gotoref="CONFIG.3.525">[Group] online_application</item></tablecell>
<tablecell><item gotoref="CONFIG.3.783">[Group] junos-https</item></tablecell>
<tablecell><item>No</item></tablecell>
</tablerow>
</tablebody>
</table>
我们有 headers 列和三个防火墙规则。
这是我的代码:
#!/usr/bin/env ruby
require 'nokogiri'
require 'csv'
fwpol = File.open(ARGV[0]) { |f| Nokogiri::XML(f) }
rule_array = []
fwpol.xpath('./table/tablebody/tablerow').each do |item|
rules = {}
rules[:name] = item.xpath('./tablecell/item')[0].text
rules[:action] = item.xpath('./tablecell/item')[1].text
rules[:source] = item.xpath('./tablecell/item')[2].text
rule_array << rules
end
puts rule_array
前两个散列条目 :name
和 :action
工作得很好,因为这些字段中只有一个值。
如果我 运行 代码,它不会在有多个值的地方打印。粗体 XML 行显示了我所指的内容。我需要以某种方式迭代这些值,但到目前为止我的尝试没有结果。
您可以通过以下方式获取多个元素文本作为Array
require 'nokogiri'
require 'csv'
fwpol = File.open(ARGV[0]) { |f| Nokogiri::XML(f) }
rule_array = []
fwpol.xpath('./table/tablebody/tablerow').each do |item|
rules = {}
rules[:name] = item.xpath('./tablecell[1]/item').text
rules[:action] = item.xpath('./tablecell[2]/item').text
rules[:source] = item.xpath('./tablecell[3]/item').map(&:text)
rule_array << rules
end
puts rule_array
输出在这里。
{:name=>"test_inbound", :action=>"Allow", :source=>["[Group] test_b2_group"]}
{:name=>"host02_inbound", :action=>"Allow", :source=>["[Group] host02_group"]}
{:name=>"randomhost", :action=>"Allow", :source=>["[Group] Host_group_2", "[Group] another_server"]}
我会这样做:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<table index="44" title=" from PUBLIC to DMZ administrative service rules on Firewall01" ref="FILTER.BLACKLIST">
<tablebody>
<tablerow>
<tablecell><item>test_inbound</item></tablecell>
<tablecell><item>Allow</item></tablecell>
<tablecell><item gotoref="CONFIG.3.452">[Group] test_b2_group</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>Yes</item></tablecell>
</tablerow>
<tablerow>
<tablecell><item>randomhost</item></tablecell>
<tablecell><item>Allow</item></tablecell>
<tablecell><item gotoref="CONFIG.3.383">[Group] Host_group_2</item><item gotoref="CONFIG.3.382">[Group] another_server</item></tablecell>
<tablecell><item gotoref="CONFIG.3.510">[Group] crazy_application</item><item gotoref="CONFIG.3.511">[Group] internal_app</item><item gotoref="CONFIG.3.525">[Group] online_application</item></tablecell>
<tablecell><item gotoref="CONFIG.3.783">[Group] junos-https</item></tablecell>
<tablecell><item>No</item></tablecell>
</tablerow>
</tablebody>
</table>
EOT
rule_array = doc.search('tablerow').map{ |row|
name, action, source = row.search('tablecell')[0, 3].map{ |tc| tc.search('item').map(&:text) }
{
name: name,
action: action,
source: source
}
}
其中,当 运行 将 return rule_array
包含一个哈希数组,其中最后一个包含两个 item
条目:
require 'ap'
ap rule_array
# >> [
# >> [0] {
# >> :name => [
# >> [0] "test_inbound"
# >> ],
# >> :action => [
# >> [0] "Allow"
# >> ],
# >> :source => [
# >> [0] "[Group] test_b2_group"
# >> ]
# >> },
# >> [1] {
# >> :name => [
# >> [0] "randomhost"
# >> ],
# >> :action => [
# >> [0] "Allow"
# >> ],
# >> :source => [
# >> [0] "[Group] Host_group_2",
# >> [1] "[Group] another_server"
# >> ]
# >> }
# >> ]
注意:不要这样做:
fwpol = File.open(ARGV[0]) { |f| Nokogiri::XML(f) }
使用更简单:
fwpol = Nokogiri::XML(File.read(ARGV[0]))
而不是做:
item.xpath('./tablecell/item')[0].text
item.xpath('./tablecell/item')[1].text
item.xpath('./tablecell/item')[2].text
只需找到表格单元格标签一次,然后将您想要的切片:[0, 3]
,然后遍历该小组。它更快并且减少了代码的重复。
另见“”。
本文档是防火墙配置的输出。我正在尝试构建防火墙规则的散列。我稍后会把这个数据输出到CSV/console/whatever 我需要:
<table index="44" title=" from PUBLIC to DMZ administrative service rules on Firewall01" ref="FILTER.BLACKLIST">
<headings>
<heading>Rule</heading>
<heading>Action</heading>
<heading>Source</heading>
<heading>Destination</heading>
<heading>Service</heading>
<heading>Log</heading>
</headings>
<tablebody>
<tablerow>
<tablecell><item>test_inbound</item></tablecell>
<tablecell><item>Allow</item></tablecell>
<tablecell><item gotoref="CONFIG.3.452">[Group] test_b2_group</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>Yes</item></tablecell>
</tablerow>
<tablerow>
<tablecell><item>host02_inbound</item></tablecell>
<tablecell><item>Allow</item></tablecell>
<tablecell><item gotoref="CONFIG.3.447">[Group] host02_group</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>Yes</item></tablecell>
</tablerow>
<tablerow>
<tablecell><item>randomhost</item></tablecell>
<tablecell><item>Allow</item></tablecell>
**<tablecell><item gotoref="CONFIG.3.383">[Group] Host_group_2</item><item gotoref="CONFIG.3.382">[Group] another_server</item></tablecell>**
<tablecell><item gotoref="CONFIG.3.510">[Group] crazy_application</item><item gotoref="CONFIG.3.511">[Group] internal_app</item><item gotoref="CONFIG.3.525">[Group] online_application</item></tablecell>
<tablecell><item gotoref="CONFIG.3.783">[Group] junos-https</item></tablecell>
<tablecell><item>No</item></tablecell>
</tablerow>
</tablebody>
</table>
我们有 headers 列和三个防火墙规则。
这是我的代码:
#!/usr/bin/env ruby
require 'nokogiri'
require 'csv'
fwpol = File.open(ARGV[0]) { |f| Nokogiri::XML(f) }
rule_array = []
fwpol.xpath('./table/tablebody/tablerow').each do |item|
rules = {}
rules[:name] = item.xpath('./tablecell/item')[0].text
rules[:action] = item.xpath('./tablecell/item')[1].text
rules[:source] = item.xpath('./tablecell/item')[2].text
rule_array << rules
end
puts rule_array
前两个散列条目 :name
和 :action
工作得很好,因为这些字段中只有一个值。
如果我 运行 代码,它不会在有多个值的地方打印。粗体 XML 行显示了我所指的内容。我需要以某种方式迭代这些值,但到目前为止我的尝试没有结果。
您可以通过以下方式获取多个元素文本作为Array
require 'nokogiri'
require 'csv'
fwpol = File.open(ARGV[0]) { |f| Nokogiri::XML(f) }
rule_array = []
fwpol.xpath('./table/tablebody/tablerow').each do |item|
rules = {}
rules[:name] = item.xpath('./tablecell[1]/item').text
rules[:action] = item.xpath('./tablecell[2]/item').text
rules[:source] = item.xpath('./tablecell[3]/item').map(&:text)
rule_array << rules
end
puts rule_array
输出在这里。
{:name=>"test_inbound", :action=>"Allow", :source=>["[Group] test_b2_group"]}
{:name=>"host02_inbound", :action=>"Allow", :source=>["[Group] host02_group"]}
{:name=>"randomhost", :action=>"Allow", :source=>["[Group] Host_group_2", "[Group] another_server"]}
我会这样做:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<table index="44" title=" from PUBLIC to DMZ administrative service rules on Firewall01" ref="FILTER.BLACKLIST">
<tablebody>
<tablerow>
<tablecell><item>test_inbound</item></tablecell>
<tablecell><item>Allow</item></tablecell>
<tablecell><item gotoref="CONFIG.3.452">[Group] test_b2_group</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>[Host] Any</item></tablecell>
<tablecell><item>Yes</item></tablecell>
</tablerow>
<tablerow>
<tablecell><item>randomhost</item></tablecell>
<tablecell><item>Allow</item></tablecell>
<tablecell><item gotoref="CONFIG.3.383">[Group] Host_group_2</item><item gotoref="CONFIG.3.382">[Group] another_server</item></tablecell>
<tablecell><item gotoref="CONFIG.3.510">[Group] crazy_application</item><item gotoref="CONFIG.3.511">[Group] internal_app</item><item gotoref="CONFIG.3.525">[Group] online_application</item></tablecell>
<tablecell><item gotoref="CONFIG.3.783">[Group] junos-https</item></tablecell>
<tablecell><item>No</item></tablecell>
</tablerow>
</tablebody>
</table>
EOT
rule_array = doc.search('tablerow').map{ |row|
name, action, source = row.search('tablecell')[0, 3].map{ |tc| tc.search('item').map(&:text) }
{
name: name,
action: action,
source: source
}
}
其中,当 运行 将 return rule_array
包含一个哈希数组,其中最后一个包含两个 item
条目:
require 'ap'
ap rule_array
# >> [
# >> [0] {
# >> :name => [
# >> [0] "test_inbound"
# >> ],
# >> :action => [
# >> [0] "Allow"
# >> ],
# >> :source => [
# >> [0] "[Group] test_b2_group"
# >> ]
# >> },
# >> [1] {
# >> :name => [
# >> [0] "randomhost"
# >> ],
# >> :action => [
# >> [0] "Allow"
# >> ],
# >> :source => [
# >> [0] "[Group] Host_group_2",
# >> [1] "[Group] another_server"
# >> ]
# >> }
# >> ]
注意:不要这样做:
fwpol = File.open(ARGV[0]) { |f| Nokogiri::XML(f) }
使用更简单:
fwpol = Nokogiri::XML(File.read(ARGV[0]))
而不是做:
item.xpath('./tablecell/item')[0].text
item.xpath('./tablecell/item')[1].text
item.xpath('./tablecell/item')[2].text
只需找到表格单元格标签一次,然后将您想要的切片:[0, 3]
,然后遍历该小组。它更快并且减少了代码的重复。
另见“