尝试使用 ruby 解压缩 600mb tgz 会给出超出整数范围的错误
Trying to unzip a 600mb tgz with ruby gives out of integer range error
正在尝试解压 tgz 文件...使用以下代码:
tar_extract.each do |entry|
entry_filename = File.basename(entry.full_name)
next if entry.directory? # don't unzip directories
next if !entry.file? # if it's not a file skip
next if entry.full_name.starts_with?('/') # another check
file_path = File.join(working_directory, entry_filename)
puts "Writing file: #{file_path}"
File.open(file_path, 'wb') do |f|
f.write(entry.read)
end
bytes = File.size(file_path)
puts "Successfully wrote file with #{bytes} bytes"
end
tar_extract.close
这段代码通常可以成功运行,但是当 TGZ 中的文件太大时,我会得到一个整数超出范围的错误。
Writing file: /files/working_dir/test1.tar.gz
Successfully wrote file with 244704472 bytes
Writing file: /files/working_dir/test2.sql
RangeError: integer 2556143960 too big to convert to `int'
from /usr/local/rvm/rubies/ruby-2.1.1/lib/ruby/site_ruby/2.1.0/rubygems/package/tar_reader/entry.rb:126:in `read'
我不确定我还应该尝试什么。
查看 ruby 源代码,这是代码块:
##
# Reads +len+ bytes from the tar file entry, or the rest of the entry if
# nil
def read(len = nil)
check_closed
return nil if @read >= @header.size
len ||= @header.size - @read
max_read = [len, @header.size - @read].min
ret = @io.read max_read
@read += ret.size
ret
end
您可能可以通过更改以下内容来解决此问题:
File.open(file_path, 'wb') do |f|
f.write(entry.read)
end
进入一个循环,在该循环中您使用参数调用 entry.read
,以获得该迭代中要处理的最大字节数。您可能必须分成两个调用,因为调用 entry.read
可能 return 为零,表明没有更多数据要处理。
在 Joe 的指导下,我找到了答案。
我将 File
块更改为:
File.open(file_path, 'wb') do |f|
while !entry.eof?
f.write(entry.read(16000)) # 16 KB
end
end
之所以选择16KB,是因为我做了一堆benchmark的
b = Benchmark.measure do
File.open(file_path, 'wb') do |f|
while !entry.eof?
f.write(entry.read(16000)) # 16 KB
end
end
end
bytes = File.size(file_path)
puts("Successfully wrote file with #{bytes} bytes in #{b.real}")
经过一些研究,似乎每个磁盘都有自己的最佳块大小。我有两个用于基准测试的文件,一个 211mb
的文件和一个 6.6gb
的文件。结果如下,但事实证明 16KB - 64KB 是我的磁盘的最佳范围。
2 gb // 2047483648
Successfully wrote file with 7021620216 bytes in 60.360527059
Successfully wrote file with 220613778 bytes in 2.084798686
1 gb // 1073741824
Successfully wrote file with 7021620216 bytes in 42.345642806
Successfully wrote file with 7021620216 bytes in 48.941375145
Successfully wrote file with 7021620216 bytes in 51.501044608
Successfully wrote file with 7021620216 bytes in 58.81474911
Successfully wrote file with 220613778 bytes in 1.57968424
Successfully wrote file with 220613778 bytes in 2.28171993
Successfully wrote file with 220613778 bytes in 5.905203041
Successfully wrote file with 220613778 bytes in 16.944126945
4KB // 4000
Successfully wrote file with 7021620216 bytes in 43.39409191
Successfully wrote file with 7021620216 bytes in 44.572620161
Successfully wrote file with 7021620216 bytes in 48.510513964
Successfully wrote file with 7021620216 bytes in 53.839022034
Successfully wrote file with 220613778 bytes in 1.982647292
Successfully wrote file with 220613778 bytes in 2.071772595
Successfully wrote file with 220613778 bytes in 2.132004983
Successfully wrote file with 220613778 bytes in 2.221654993
8KB // 8000
Successfully wrote file with 7021620216 bytes in 41.851550514
Successfully wrote file with 7021620216 bytes in 45.611952667
Successfully wrote file with 7021620216 bytes in 50.068614205
Successfully wrote file with 7021620216 bytes in 50.726276706
Successfully wrote file with 220613778 bytes in 1.941246687
Successfully wrote file with 220613778 bytes in 2.456356439
Successfully wrote file with 220613778 bytes in 2.56323527
Successfully wrote file with 220613778 bytes in 3.756049832
16KB // 16000
Successfully wrote file with 7021620216 bytes in 36.929413152
Successfully wrote file with 7021620216 bytes in 36.486866289
Successfully wrote file with 7021620216 bytes in 36.743103326
Successfully wrote file with 7021620216 bytes in 37.019910405
Successfully wrote file with 220613778 bytes in 1.504792162
Successfully wrote file with 220613778 bytes in 1.620161067
Successfully wrote file with 220613778 bytes in 1.622070414
Successfully wrote file with 220613778 bytes in 1.698627821
32kB // 32000
Successfully wrote file with 7021620216 bytes in 35.802759912
Successfully wrote file with 7021620216 bytes in 38.775857377
Successfully wrote file with 7021620216 bytes in 39.116311496
Successfully wrote file with 7021620216 bytes in 39.126005469
Successfully wrote file with 220613778 bytes in 1.696821094
Successfully wrote file with 220613778 bytes in 1.773727215
Successfully wrote file with 220613778 bytes in 4.023144931
Successfully wrote file with 220613778 bytes in 4.08615266
64kb // 64000
Successfully wrote file with 7021620216 bytes in 36.732343382
Successfully wrote file with 7021620216 bytes in 37.914365658
Successfully wrote file with 7021620216 bytes in 38.336098907
Successfully wrote file with 7021620216 bytes in 39.146334479
Successfully wrote file with 220613778 bytes in 1.662487522
Successfully wrote file with 220613778 bytes in 1.674177939
Successfully wrote file with 220613778 bytes in 1.745556917
Successfully wrote file with 220613778 bytes in 1.784492717
正在尝试解压 tgz 文件...使用以下代码:
tar_extract.each do |entry|
entry_filename = File.basename(entry.full_name)
next if entry.directory? # don't unzip directories
next if !entry.file? # if it's not a file skip
next if entry.full_name.starts_with?('/') # another check
file_path = File.join(working_directory, entry_filename)
puts "Writing file: #{file_path}"
File.open(file_path, 'wb') do |f|
f.write(entry.read)
end
bytes = File.size(file_path)
puts "Successfully wrote file with #{bytes} bytes"
end
tar_extract.close
这段代码通常可以成功运行,但是当 TGZ 中的文件太大时,我会得到一个整数超出范围的错误。
Writing file: /files/working_dir/test1.tar.gz
Successfully wrote file with 244704472 bytes
Writing file: /files/working_dir/test2.sql
RangeError: integer 2556143960 too big to convert to `int'
from /usr/local/rvm/rubies/ruby-2.1.1/lib/ruby/site_ruby/2.1.0/rubygems/package/tar_reader/entry.rb:126:in `read'
我不确定我还应该尝试什么。
查看 ruby 源代码,这是代码块:
##
# Reads +len+ bytes from the tar file entry, or the rest of the entry if
# nil
def read(len = nil)
check_closed
return nil if @read >= @header.size
len ||= @header.size - @read
max_read = [len, @header.size - @read].min
ret = @io.read max_read
@read += ret.size
ret
end
您可能可以通过更改以下内容来解决此问题:
File.open(file_path, 'wb') do |f|
f.write(entry.read)
end
进入一个循环,在该循环中您使用参数调用 entry.read
,以获得该迭代中要处理的最大字节数。您可能必须分成两个调用,因为调用 entry.read
可能 return 为零,表明没有更多数据要处理。
在 Joe 的指导下,我找到了答案。
我将 File
块更改为:
File.open(file_path, 'wb') do |f|
while !entry.eof?
f.write(entry.read(16000)) # 16 KB
end
end
之所以选择16KB,是因为我做了一堆benchmark的
b = Benchmark.measure do
File.open(file_path, 'wb') do |f|
while !entry.eof?
f.write(entry.read(16000)) # 16 KB
end
end
end
bytes = File.size(file_path)
puts("Successfully wrote file with #{bytes} bytes in #{b.real}")
经过一些研究,似乎每个磁盘都有自己的最佳块大小。我有两个用于基准测试的文件,一个 211mb
的文件和一个 6.6gb
的文件。结果如下,但事实证明 16KB - 64KB 是我的磁盘的最佳范围。
2 gb // 2047483648
Successfully wrote file with 7021620216 bytes in 60.360527059
Successfully wrote file with 220613778 bytes in 2.084798686
1 gb // 1073741824
Successfully wrote file with 7021620216 bytes in 42.345642806
Successfully wrote file with 7021620216 bytes in 48.941375145
Successfully wrote file with 7021620216 bytes in 51.501044608
Successfully wrote file with 7021620216 bytes in 58.81474911
Successfully wrote file with 220613778 bytes in 1.57968424
Successfully wrote file with 220613778 bytes in 2.28171993
Successfully wrote file with 220613778 bytes in 5.905203041
Successfully wrote file with 220613778 bytes in 16.944126945
4KB // 4000
Successfully wrote file with 7021620216 bytes in 43.39409191
Successfully wrote file with 7021620216 bytes in 44.572620161
Successfully wrote file with 7021620216 bytes in 48.510513964
Successfully wrote file with 7021620216 bytes in 53.839022034
Successfully wrote file with 220613778 bytes in 1.982647292
Successfully wrote file with 220613778 bytes in 2.071772595
Successfully wrote file with 220613778 bytes in 2.132004983
Successfully wrote file with 220613778 bytes in 2.221654993
8KB // 8000
Successfully wrote file with 7021620216 bytes in 41.851550514
Successfully wrote file with 7021620216 bytes in 45.611952667
Successfully wrote file with 7021620216 bytes in 50.068614205
Successfully wrote file with 7021620216 bytes in 50.726276706
Successfully wrote file with 220613778 bytes in 1.941246687
Successfully wrote file with 220613778 bytes in 2.456356439
Successfully wrote file with 220613778 bytes in 2.56323527
Successfully wrote file with 220613778 bytes in 3.756049832
16KB // 16000
Successfully wrote file with 7021620216 bytes in 36.929413152
Successfully wrote file with 7021620216 bytes in 36.486866289
Successfully wrote file with 7021620216 bytes in 36.743103326
Successfully wrote file with 7021620216 bytes in 37.019910405
Successfully wrote file with 220613778 bytes in 1.504792162
Successfully wrote file with 220613778 bytes in 1.620161067
Successfully wrote file with 220613778 bytes in 1.622070414
Successfully wrote file with 220613778 bytes in 1.698627821
32kB // 32000
Successfully wrote file with 7021620216 bytes in 35.802759912
Successfully wrote file with 7021620216 bytes in 38.775857377
Successfully wrote file with 7021620216 bytes in 39.116311496
Successfully wrote file with 7021620216 bytes in 39.126005469
Successfully wrote file with 220613778 bytes in 1.696821094
Successfully wrote file with 220613778 bytes in 1.773727215
Successfully wrote file with 220613778 bytes in 4.023144931
Successfully wrote file with 220613778 bytes in 4.08615266
64kb // 64000
Successfully wrote file with 7021620216 bytes in 36.732343382
Successfully wrote file with 7021620216 bytes in 37.914365658
Successfully wrote file with 7021620216 bytes in 38.336098907
Successfully wrote file with 7021620216 bytes in 39.146334479
Successfully wrote file with 220613778 bytes in 1.662487522
Successfully wrote file with 220613778 bytes in 1.674177939
Successfully wrote file with 220613778 bytes in 1.745556917
Successfully wrote file with 220613778 bytes in 1.784492717