如何将字符串解析为 Ruby 中的散列 table
How to parse a string into a hash table in Ruby
我正在从远程设备获取字符串形式的数据。我需要解析数据。数据通常是这样的:
MO SCGR SC RSITE ALARM_SITUATION
RXOTG-59 59 0 EK0322 ABIS PATH FAULT
RXOCF-59 EK0322 LOCAL MODE
RXOTRX-59-0 4 EK0322 LOCAL MODE
RXOTRX-59-1 EK0322 LOCAL MODE
RXOTRX-59-4 0 EK0322 LOCAL MODE
RXOTRX-59-5 1 3 EK0322 LOCAL MODE
RXOTRX-59-8 EK0322 LOCAL MODE
RXOTRX-59-9 EK0322 LOCAL MODE
我喜欢将数据作为数组的数组或任何其他编程合理的结构。
我使用以下方法将数据拆分为一个数组:
str.split("\r\n")
然后删除数组中每个元素上的额外 space:
tsgs.map! {|tsg| tsg.gsub(/\s+/, " ").split(" ") }
但这有局限性,因为不考虑空单元格。我希望数组包含五个元素,但它包含的元素少于五个。
情况一:在这种情况下,我得到了预期的结果:
RXOTG-59 59 0 EK0322 ABIS PATH FAULT
转换为
["RXOTG-59", "59", "0", "EK0322", "ABIS PATH FAULT"]
案例 2:在这种情况下,我得到了意想不到的结果:
RXOTRX-59-9 EK0322 LOCAL MODE
转换为
["RXOTRX-59-9", "EK0322", "LOCAL MODE"]
def getCommandResult(tgdatas)
tgdatas_arr = tgdatas.split("\r\n")
tsgs = tgdatas_arr[5..tgdatas_arr.index("END")-2]
tsgs.map! {|tsg| tsg.gsub(/\s+/, " ").split(" ")[0] }
return tsgs
end
考虑到 data_string
:
,试试这对你是否可行
data_string = "MO SCGR SC RSITE ALARM_SITUATION\nRXOTG-59 59 0 EK0322 ABIS PATH FAULT\nRXOCF-59 EK0322 LOCAL MODE\nRXOTRX-59-0 4 EK0322 LOCAL MODE\nRXOTRX-59-1 EK0322 LOCAL MODE\nRXOTRX-59-4 0 EK0322 LOCAL MODE\nRXOTRX-59-5 1 3 EK0322 LOCAL MODE\nRXOTRX-59-8 EK0322 LOCAL MODE\nRXOTRX-59-9 EK0322 LOCAL MODE"
设置每一行的起点,因为它似乎与header对齐。
data = data_string.split("\n")
starts = [0, 18, 24, 35, 51, (data.map(&:size)).max ]
然后根据起点映射每一行,去除尾随空格:
data = data.map { |line| starts.each_cons(2).map { |a,b| line[a..b-1].strip } }
所以你最终会得到这个数组:
# [["MO", "SCGR", "SC", "RSITE", "ALARM_SITUATION"]
# ["RXOTG-59", "59", "0", "EK0322", "ABIS PATH FAULT"]
# ["RXOCF-59", "", "", "EK0322", "LOCAL MODE"]
# ["RXOTRX-59-0", "4", "", "EK0322", "LOCAL MODE"]
# ["RXOTRX-59-1", "", "", "EK0322", "LOCAL MODE"]
# ["RXOTRX-59-4", "", "0", "EK0322", "LOCAL MODE"]
# ["RXOTRX-59-5", "1", "3", "EK0322", "LOCAL MODE"]
# ["RXOTRX-59-8", "", "", "EK0322", "LOCAL MODE"]
# ["RXOTRX-59-9", "", "", "EK0322", "LOCAL MODE"]]
然后您可以将其转换为散列或使用 csv 库来操作您的数据。
这是一种生成哈希数组的方法:
headers = data[0]
body = data[1..]
body.map { |line| headers.map(&:to_sym).zip(line).to_h }
#=> [{:MO=>"RXOTG-59", :SCGR=>"59", :SC=>"0", :RSITE=>"EK0322", :ALARM_SITUATION=>"ABIS PATH FAULT"}, {:MO=>"RXOCF-59", :SCGR=>"", :SC=>"", :RSITE=>"EK0322", :ALARM_SITUATION=>"LOCAL MODE"}, ...
String.unpack 和指令 "A" 非常适合固定宽度的字符串。
str = "RXOTRX-59-9 EK0322 LOCAL MODE"
p str.unpack("A20A4A11A16A15" ) # => ["RXOTRX-59-9", "", "", "EK0322", "LOCAL MODE"]
你的字符串1,稍作修改:
data = <<END
MO SCGR SC RSITE ALARM_SITUATION
RXOTG-59 59 0 EK0322 ABIS PATH FAULT
RXOCF-59 EK0322 LOCAL MODE
RXOTRX-59-0 4 EK0322 LOCAL MODE
RXOTRX-59-1 EK0322 LOCAL MODE
RXOTRX-59-4 0
RXOTRX-59-5 1 3 EK0322 LOCAL MODE
RXOTRX-59-8 EK0322 LOCAL MODE
RXOTRX-59-9 EK0322 LOCAL MODE
END
这个字符串看起来很像 CSV 数据结构,所以我们可能会想将它转换为 CSV 字符串,从而允许我们使用 CSV class 提供的方法.
将字符串转换为 CSV 字符串
代码
def convert_to_csv(data)
cols = data[/.+?\n/].gsub(/ \S/).map { |s| Regexp.last_match.begin(0) }
data.each_line.map do |s|
cols.each { |i| s[i] = ',' if s.size > i+1 }
s.gsub(/ *, */, ',')
end.join
end
转换字符串
现在将字符串 data
转换为 CSV 字符串。
csv_data = convert_to_csv(data)
puts csv_data
MO,SCGR,SC,RSITE,ALARM_SITUATION
RXOTG-59,59,0,EK0322,ABIS PATH FAULT
RXOCF-59,,,EK0322,LOCAL MODE
RXOTRX-59-0,4,,EK0322,LOCAL MODE
RXOTRX-59-1,,,EK0322,LOCAL MODE
RXOTRX-59-4,,0
RXOTRX-59-5,1,3,EK0322,LOCAL MODE
RXOTRX-59-8,,,EK0322,LOCAL MODE
RXOTRX-59-9,,,EK0322,LOCAL MODE
说明
步骤如下
s = data[/.+?\n/]
#=> "MO SCGR SC RSITE ALARM_SITUATION\n"
e0 = s.gsub(/ \S/)
#=> #<Enumerator: "MO ... ALARM_SITUATION\n":gsub(/ \S/)>
cols = e0.map { Regexp.last_match.begin(0) - 1 }
#=> [17, 23, 34, 50]
e1 = data.each_line
#=> #<Enumerator: "MO ... LOCAL MODE\n":each_line>
a = e1.map do |s|
cols.each { |i| s[i] = ',' if s.size > i+1 }
s.gsub(/ *, */,',')
end
#=> ["MO,SCGR,SC,RSITE,ALARM_SITUATION\n",
# "RXOTG-59,59,0,EK0322,ABIS PATH FAULT\n",
# ...
# "RXOTRX-59-9,,,EK0322,LOCAL MODE\n"]
a.join
#=> < return value above >
让我们仔细看看a
的计算。首先,块变量 s
被赋值给枚举器生成的第一个元素 e1
:
s = e1.next
#=> "MO SCGR SC RSITE ALARM_SITUATION\n"
然后进行区块计算:
cols.each { |i| s[i] = ',' }
s #=> "MO ,SCGR ,SC ,RSITE ,ALARM_SITUATION\n"
s.gsub(/ *, */,',')
#=> "MO,SCGR,SC,RSITE,ALARM_SITUATION\n"
与 gsub
一起使用的正则表达式为 "match zero or more spaces followed by a comma, followed by zero or more spaces"。
当短线传递到块时,将执行以下计算。
s = "RXOTRX-59-4 0"
s.size
#=> 25
cols
#=> [17, 23, 34, 50]
cols.each { |i| s[i] = ',' if s.size > i+1 }
s #=> "RXOTRX-59-4 , ,0"
s.gsub(/ *, */,',')
#=> "RXOTRX-59-4,,0"
e1
的其余元素类似处理。
将 CSV 字符串转换为散列
我们现在可以使用 CSV 方法。例如,假设我们希望创建一个散列数组,其键是 header 元素,小写并转换为符号,而 "SCGR"
和 "SC"
的值将转换为整数。为此,我们使用 class 方法 CSV::new,为方法选项指定适当的值。
构造散列
require 'csv'
CSV.new(csv_data, headers: true, header_converters: :symbol,
converters: :all).to_a.map(&:to_h)
#=> [{:mo=>"RXOTG-59", :scgr=>59, :sc=>0, :rsite=>"EK0322",
# :alarm_situation=>"ABIS PATH FAULT"},
# {:mo=>"RXOCF-59", :scgr=>nil, :sc=>nil, :rsite=>"EK0322",
# :alarm_situation=>"LOCAL MODE"},
# {:mo=>"RXOTRX-59-0", :scgr=>4, :sc=>nil, :rsite=>"EK0322",
# :alarm_situation=>"LOCAL MODE"},
# {:mo=>"RXOTRX-59-1", :scgr=>nil, :sc=>nil, :rsite=>"EK0322",
# :alarm_situation=>"LOCAL MODE"},
# {:mo=>"RXOTRX-59-4", :scgr=>nil, :sc=>0, :rsite=>nil,
# :alarm_situation=>nil},
# {:mo=>"RXOTRX-59-5", :scgr=>1, :sc=>3, :rsite=>nil"EK0322",
# :alarm_situation=>"LOCAL MODE"},
# {:mo=>"RXOTRX-59-8", :scgr=>nil, :sc=>nil, :rsite=>"EK0322",
# :alarm_situation=>"LOCAL MODE"},
# {:mo=>"RXOTRX-59-9", :scgr=>nil, :sc=>nil, :rsite=>"EK0322",
# :alarm_situation=>"LOCAL MODE"}]
说明
步骤如下
csv = CSV.new(csv_data, headers: true, header_converters: :symbol,
converters: :all)
#=> <#CSV io_type:StringIO encoding:UTF-8 lineno:0 col_sep:",
# " row_sep:"\n" quote_char:"\"" headers:true>
a = csv.to_a
#=> [#<CSV::Row mo:"RXOTG-59" scgr:59 sc:0 rsite:"EK0322" alarm_situation:"ABIS PATH FAULT">,
# #<CSV::Row mo:"RXOCF-59" scgr:nil sc:nil rsite:"EK0322" alarm_situation:"LOCAL MODE">,
# ...
# #<CSV::Row mo:"RXOTRX-59-9" scgr:nil sc:nil rsite:"EK0322" alarm_situation:"LOCAL MODE">]
a.map(&:to_h)
#=> < hash shown above >
1 要 运行 您需要 un-indent 这个 heredoc 的代码(或将第一行更改为 data = <<-END.lines.map(&:lstrip).join
)。
我正在从远程设备获取字符串形式的数据。我需要解析数据。数据通常是这样的:
MO SCGR SC RSITE ALARM_SITUATION
RXOTG-59 59 0 EK0322 ABIS PATH FAULT
RXOCF-59 EK0322 LOCAL MODE
RXOTRX-59-0 4 EK0322 LOCAL MODE
RXOTRX-59-1 EK0322 LOCAL MODE
RXOTRX-59-4 0 EK0322 LOCAL MODE
RXOTRX-59-5 1 3 EK0322 LOCAL MODE
RXOTRX-59-8 EK0322 LOCAL MODE
RXOTRX-59-9 EK0322 LOCAL MODE
我喜欢将数据作为数组的数组或任何其他编程合理的结构。
我使用以下方法将数据拆分为一个数组:
str.split("\r\n")
然后删除数组中每个元素上的额外 space:
tsgs.map! {|tsg| tsg.gsub(/\s+/, " ").split(" ") }
但这有局限性,因为不考虑空单元格。我希望数组包含五个元素,但它包含的元素少于五个。
情况一:在这种情况下,我得到了预期的结果:
RXOTG-59 59 0 EK0322 ABIS PATH FAULT
转换为
["RXOTG-59", "59", "0", "EK0322", "ABIS PATH FAULT"]
案例 2:在这种情况下,我得到了意想不到的结果:
RXOTRX-59-9 EK0322 LOCAL MODE
转换为
["RXOTRX-59-9", "EK0322", "LOCAL MODE"]
def getCommandResult(tgdatas)
tgdatas_arr = tgdatas.split("\r\n")
tsgs = tgdatas_arr[5..tgdatas_arr.index("END")-2]
tsgs.map! {|tsg| tsg.gsub(/\s+/, " ").split(" ")[0] }
return tsgs
end
考虑到 data_string
:
data_string = "MO SCGR SC RSITE ALARM_SITUATION\nRXOTG-59 59 0 EK0322 ABIS PATH FAULT\nRXOCF-59 EK0322 LOCAL MODE\nRXOTRX-59-0 4 EK0322 LOCAL MODE\nRXOTRX-59-1 EK0322 LOCAL MODE\nRXOTRX-59-4 0 EK0322 LOCAL MODE\nRXOTRX-59-5 1 3 EK0322 LOCAL MODE\nRXOTRX-59-8 EK0322 LOCAL MODE\nRXOTRX-59-9 EK0322 LOCAL MODE"
设置每一行的起点,因为它似乎与header对齐。
data = data_string.split("\n")
starts = [0, 18, 24, 35, 51, (data.map(&:size)).max ]
然后根据起点映射每一行,去除尾随空格:
data = data.map { |line| starts.each_cons(2).map { |a,b| line[a..b-1].strip } }
所以你最终会得到这个数组:
# [["MO", "SCGR", "SC", "RSITE", "ALARM_SITUATION"]
# ["RXOTG-59", "59", "0", "EK0322", "ABIS PATH FAULT"]
# ["RXOCF-59", "", "", "EK0322", "LOCAL MODE"]
# ["RXOTRX-59-0", "4", "", "EK0322", "LOCAL MODE"]
# ["RXOTRX-59-1", "", "", "EK0322", "LOCAL MODE"]
# ["RXOTRX-59-4", "", "0", "EK0322", "LOCAL MODE"]
# ["RXOTRX-59-5", "1", "3", "EK0322", "LOCAL MODE"]
# ["RXOTRX-59-8", "", "", "EK0322", "LOCAL MODE"]
# ["RXOTRX-59-9", "", "", "EK0322", "LOCAL MODE"]]
然后您可以将其转换为散列或使用 csv 库来操作您的数据。
这是一种生成哈希数组的方法:
headers = data[0]
body = data[1..]
body.map { |line| headers.map(&:to_sym).zip(line).to_h }
#=> [{:MO=>"RXOTG-59", :SCGR=>"59", :SC=>"0", :RSITE=>"EK0322", :ALARM_SITUATION=>"ABIS PATH FAULT"}, {:MO=>"RXOCF-59", :SCGR=>"", :SC=>"", :RSITE=>"EK0322", :ALARM_SITUATION=>"LOCAL MODE"}, ...
String.unpack 和指令 "A" 非常适合固定宽度的字符串。
str = "RXOTRX-59-9 EK0322 LOCAL MODE"
p str.unpack("A20A4A11A16A15" ) # => ["RXOTRX-59-9", "", "", "EK0322", "LOCAL MODE"]
你的字符串1,稍作修改:
data = <<END
MO SCGR SC RSITE ALARM_SITUATION
RXOTG-59 59 0 EK0322 ABIS PATH FAULT
RXOCF-59 EK0322 LOCAL MODE
RXOTRX-59-0 4 EK0322 LOCAL MODE
RXOTRX-59-1 EK0322 LOCAL MODE
RXOTRX-59-4 0
RXOTRX-59-5 1 3 EK0322 LOCAL MODE
RXOTRX-59-8 EK0322 LOCAL MODE
RXOTRX-59-9 EK0322 LOCAL MODE
END
这个字符串看起来很像 CSV 数据结构,所以我们可能会想将它转换为 CSV 字符串,从而允许我们使用 CSV class 提供的方法.
将字符串转换为 CSV 字符串
代码
def convert_to_csv(data)
cols = data[/.+?\n/].gsub(/ \S/).map { |s| Regexp.last_match.begin(0) }
data.each_line.map do |s|
cols.each { |i| s[i] = ',' if s.size > i+1 }
s.gsub(/ *, */, ',')
end.join
end
转换字符串
现在将字符串 data
转换为 CSV 字符串。
csv_data = convert_to_csv(data)
puts csv_data
MO,SCGR,SC,RSITE,ALARM_SITUATION
RXOTG-59,59,0,EK0322,ABIS PATH FAULT
RXOCF-59,,,EK0322,LOCAL MODE
RXOTRX-59-0,4,,EK0322,LOCAL MODE
RXOTRX-59-1,,,EK0322,LOCAL MODE
RXOTRX-59-4,,0
RXOTRX-59-5,1,3,EK0322,LOCAL MODE
RXOTRX-59-8,,,EK0322,LOCAL MODE
RXOTRX-59-9,,,EK0322,LOCAL MODE
说明
步骤如下
s = data[/.+?\n/]
#=> "MO SCGR SC RSITE ALARM_SITUATION\n"
e0 = s.gsub(/ \S/)
#=> #<Enumerator: "MO ... ALARM_SITUATION\n":gsub(/ \S/)>
cols = e0.map { Regexp.last_match.begin(0) - 1 }
#=> [17, 23, 34, 50]
e1 = data.each_line
#=> #<Enumerator: "MO ... LOCAL MODE\n":each_line>
a = e1.map do |s|
cols.each { |i| s[i] = ',' if s.size > i+1 }
s.gsub(/ *, */,',')
end
#=> ["MO,SCGR,SC,RSITE,ALARM_SITUATION\n",
# "RXOTG-59,59,0,EK0322,ABIS PATH FAULT\n",
# ...
# "RXOTRX-59-9,,,EK0322,LOCAL MODE\n"]
a.join
#=> < return value above >
让我们仔细看看a
的计算。首先,块变量 s
被赋值给枚举器生成的第一个元素 e1
:
s = e1.next
#=> "MO SCGR SC RSITE ALARM_SITUATION\n"
然后进行区块计算:
cols.each { |i| s[i] = ',' }
s #=> "MO ,SCGR ,SC ,RSITE ,ALARM_SITUATION\n"
s.gsub(/ *, */,',')
#=> "MO,SCGR,SC,RSITE,ALARM_SITUATION\n"
与 gsub
一起使用的正则表达式为 "match zero or more spaces followed by a comma, followed by zero or more spaces"。
当短线传递到块时,将执行以下计算。
s = "RXOTRX-59-4 0"
s.size
#=> 25
cols
#=> [17, 23, 34, 50]
cols.each { |i| s[i] = ',' if s.size > i+1 }
s #=> "RXOTRX-59-4 , ,0"
s.gsub(/ *, */,',')
#=> "RXOTRX-59-4,,0"
e1
的其余元素类似处理。
将 CSV 字符串转换为散列
我们现在可以使用 CSV 方法。例如,假设我们希望创建一个散列数组,其键是 header 元素,小写并转换为符号,而 "SCGR"
和 "SC"
的值将转换为整数。为此,我们使用 class 方法 CSV::new,为方法选项指定适当的值。
构造散列
require 'csv'
CSV.new(csv_data, headers: true, header_converters: :symbol,
converters: :all).to_a.map(&:to_h)
#=> [{:mo=>"RXOTG-59", :scgr=>59, :sc=>0, :rsite=>"EK0322",
# :alarm_situation=>"ABIS PATH FAULT"},
# {:mo=>"RXOCF-59", :scgr=>nil, :sc=>nil, :rsite=>"EK0322",
# :alarm_situation=>"LOCAL MODE"},
# {:mo=>"RXOTRX-59-0", :scgr=>4, :sc=>nil, :rsite=>"EK0322",
# :alarm_situation=>"LOCAL MODE"},
# {:mo=>"RXOTRX-59-1", :scgr=>nil, :sc=>nil, :rsite=>"EK0322",
# :alarm_situation=>"LOCAL MODE"},
# {:mo=>"RXOTRX-59-4", :scgr=>nil, :sc=>0, :rsite=>nil,
# :alarm_situation=>nil},
# {:mo=>"RXOTRX-59-5", :scgr=>1, :sc=>3, :rsite=>nil"EK0322",
# :alarm_situation=>"LOCAL MODE"},
# {:mo=>"RXOTRX-59-8", :scgr=>nil, :sc=>nil, :rsite=>"EK0322",
# :alarm_situation=>"LOCAL MODE"},
# {:mo=>"RXOTRX-59-9", :scgr=>nil, :sc=>nil, :rsite=>"EK0322",
# :alarm_situation=>"LOCAL MODE"}]
说明
步骤如下
csv = CSV.new(csv_data, headers: true, header_converters: :symbol,
converters: :all)
#=> <#CSV io_type:StringIO encoding:UTF-8 lineno:0 col_sep:",
# " row_sep:"\n" quote_char:"\"" headers:true>
a = csv.to_a
#=> [#<CSV::Row mo:"RXOTG-59" scgr:59 sc:0 rsite:"EK0322" alarm_situation:"ABIS PATH FAULT">,
# #<CSV::Row mo:"RXOCF-59" scgr:nil sc:nil rsite:"EK0322" alarm_situation:"LOCAL MODE">,
# ...
# #<CSV::Row mo:"RXOTRX-59-9" scgr:nil sc:nil rsite:"EK0322" alarm_situation:"LOCAL MODE">]
a.map(&:to_h)
#=> < hash shown above >
1 要 运行 您需要 un-indent 这个 heredoc 的代码(或将第一行更改为 data = <<-END.lines.map(&:lstrip).join
)。