解析字符串数组
Parse an array of strings
我有一个描述浮点数矩阵的一维字符串数组 ( Array{String,1} )(见下文)。我需要解析这个矩阵。有什么巧妙的建议吗?
- 茱莉亚 1.5
- MacOS
是的,我确实从文件中读取了这个。我不想使用 CSV 读取整个文件,因为我想保留使用内存 I/O 读取整个文件的选项,我认为 CSV 没有。另外,我有一些复杂的行,包括字符串和数字,以及我需要解析的字符串和字符串,这排除了 DelimitedFiles。列由两个空格分隔。
julia> lines[24+member_total:idx-1]
49-element Array{String,1}:
"0.0000000E+00 0.0000000E+00 0.0000000E+00 1.3308000E+01"
"0.0000000E+00 0.0000000E+00 1.9987500E-01 1.3308000E+01"
"0.0000000E+00 0.0000000E+00 1.1998650E+00 1.3308000E+01"
"0.0000000E+00 0.0000000E+00 2.1998550E+00 1.3308000E+01"
"0.0000000E+00 0.0000000E+00 3.1998450E+00 1.3308000E+01"
"0.0000000E+00 0.0000000E+00 4.1998350E+00 1.3308000E+01"
⋮
"0.0000000E+00 0.0000000E+00 5.9699895E+01 1.4000000E-01"
"0.0000000E+00 0.0000000E+00 6.0199890E+01 1.0100000E-01"
"0.0000000E+00 0.0000000E+00 6.0699885E+01 6.2000000E-02"
"0.0000000E+00 0.0000000E+00 6.1199880E+01 2.3000000E-02"
"0.0000000E+00 0.0000000E+00 6.1500000E+01 0.0000000E+00"
我解决了这个问题。不是最光滑的东西,但它确实有效...
function rmspaces(line)
line = replace(line, "\t" => " ")
# println("line: ", line)
while occursin(" ", line)
line = replace(line, " "=>" ")
# println("line: ", line)
end
return line
end
function readmatrix(lines, numcolumns::Int64; type=Float64)
#Remove the spaces to one
for i=1:length(lines)
lines[i] = rmspaces(lines[i])
end
matrix = zeros(length(lines), numcolumns)
for i=1:length(lines)
idx = 1 # set the initial stop at the beginning
spot = 1
for j=1:length(lines[i])
if lines[i][j]==' ' && j>1 #Stops at spaces
number = parse(type,lines[i][idx:j]) #from the last stop to this one
idx = j #Set this stop in memory
matrix[i,spot] = number
spot += 1
end
end
if spot<numcolumns+1 #If there isn't a space after the last number,
#we need to attach the last number in every row. If the last number
#was appended, then the spot will be increased to be more than the number
#of columns.
number = parse(type, lines[i][idx:end])
matrix[i,spot] = number
end
end
return matrix
end
strs = ["0.0000000E+00 0.0000000E+00 0.0000000E+00 1.3308000E+01",
"0.0000000E+00 0.0000000E+00 1.9987500E-01 1.3308000E+01",
"0.0000000E+00 0.0000000E+00 1.1998650E+00 1.3308000E+01"]
mapreduce(vcat, strs) do s
(parse.(Float64, split(s, " ")))'
end
3×4 Array{Float64,2}:
0.0 0.0 0.0 13.308
0.0 0.0 0.199875 13.308
0.0 0.0 1.19986 13.308
我强烈反对重新发明轮子和使用定制的解析器,因为此类解决方案在生产中的实际稳健性。
如果您的文件在单个 String
中,请使用:
using DelimitedFiles
readdlm(IOBuffer(strs))
如果您的文件作为 String
中的 Vector
使用
cat(readdlm.(IOBuffer.(strsa))...,dims=1)
最后,内存映射与CSV一起使用没有冲突:
using Mmap
s = open("d.txt") # d.txt contains your lines
# if you want to read & wrtie use "w+" option
m = Mmap.mmap(s, Vector{UInt8}) # memory mapping of your file
readdlm(IOBuffer(m))
同时,无论内存映射如何,您始终可以将流设置为开头并读取数据:
seek(s,0)
readdlm(s)
seek(s,0) # reset the stream
我有一个描述浮点数矩阵的一维字符串数组 ( Array{String,1} )(见下文)。我需要解析这个矩阵。有什么巧妙的建议吗?
- 茱莉亚 1.5
- MacOS
是的,我确实从文件中读取了这个。我不想使用 CSV 读取整个文件,因为我想保留使用内存 I/O 读取整个文件的选项,我认为 CSV 没有。另外,我有一些复杂的行,包括字符串和数字,以及我需要解析的字符串和字符串,这排除了 DelimitedFiles。列由两个空格分隔。
julia> lines[24+member_total:idx-1]
49-element Array{String,1}:
"0.0000000E+00 0.0000000E+00 0.0000000E+00 1.3308000E+01"
"0.0000000E+00 0.0000000E+00 1.9987500E-01 1.3308000E+01"
"0.0000000E+00 0.0000000E+00 1.1998650E+00 1.3308000E+01"
"0.0000000E+00 0.0000000E+00 2.1998550E+00 1.3308000E+01"
"0.0000000E+00 0.0000000E+00 3.1998450E+00 1.3308000E+01"
"0.0000000E+00 0.0000000E+00 4.1998350E+00 1.3308000E+01"
⋮
"0.0000000E+00 0.0000000E+00 5.9699895E+01 1.4000000E-01"
"0.0000000E+00 0.0000000E+00 6.0199890E+01 1.0100000E-01"
"0.0000000E+00 0.0000000E+00 6.0699885E+01 6.2000000E-02"
"0.0000000E+00 0.0000000E+00 6.1199880E+01 2.3000000E-02"
"0.0000000E+00 0.0000000E+00 6.1500000E+01 0.0000000E+00"
我解决了这个问题。不是最光滑的东西,但它确实有效...
function rmspaces(line)
line = replace(line, "\t" => " ")
# println("line: ", line)
while occursin(" ", line)
line = replace(line, " "=>" ")
# println("line: ", line)
end
return line
end
function readmatrix(lines, numcolumns::Int64; type=Float64)
#Remove the spaces to one
for i=1:length(lines)
lines[i] = rmspaces(lines[i])
end
matrix = zeros(length(lines), numcolumns)
for i=1:length(lines)
idx = 1 # set the initial stop at the beginning
spot = 1
for j=1:length(lines[i])
if lines[i][j]==' ' && j>1 #Stops at spaces
number = parse(type,lines[i][idx:j]) #from the last stop to this one
idx = j #Set this stop in memory
matrix[i,spot] = number
spot += 1
end
end
if spot<numcolumns+1 #If there isn't a space after the last number,
#we need to attach the last number in every row. If the last number
#was appended, then the spot will be increased to be more than the number
#of columns.
number = parse(type, lines[i][idx:end])
matrix[i,spot] = number
end
end
return matrix
end
strs = ["0.0000000E+00 0.0000000E+00 0.0000000E+00 1.3308000E+01",
"0.0000000E+00 0.0000000E+00 1.9987500E-01 1.3308000E+01",
"0.0000000E+00 0.0000000E+00 1.1998650E+00 1.3308000E+01"]
mapreduce(vcat, strs) do s
(parse.(Float64, split(s, " ")))'
end
3×4 Array{Float64,2}:
0.0 0.0 0.0 13.308
0.0 0.0 0.199875 13.308
0.0 0.0 1.19986 13.308
我强烈反对重新发明轮子和使用定制的解析器,因为此类解决方案在生产中的实际稳健性。
如果您的文件在单个 String
中,请使用:
using DelimitedFiles
readdlm(IOBuffer(strs))
如果您的文件作为 String
中的 Vector
使用
cat(readdlm.(IOBuffer.(strsa))...,dims=1)
最后,内存映射与CSV一起使用没有冲突:
using Mmap
s = open("d.txt") # d.txt contains your lines
# if you want to read & wrtie use "w+" option
m = Mmap.mmap(s, Vector{UInt8}) # memory mapping of your file
readdlm(IOBuffer(m))
同时,无论内存映射如何,您始终可以将流设置为开头并读取数据:
seek(s,0)
readdlm(s)
seek(s,0) # reset the stream