在 apache Camel 中压缩和解压缩大文件而不将整个文件加载到内存中

Question

我们正在使用 Apache Camel 压缩和解压缩我们的文件。我们使用标准 .marshal().gzip() 和 .unmarshall().gzip() API。

我们的问题是当我们得到非常大的文件时，比如 800MB 到超过 1GB 的文件大小，我们的应用程序内存不足，因为整个文件正在加载到内存中进行压缩和解压缩。

是否有任何 camel api 或 java 库可以帮助 zip/unzip 文件而无需将整个文件加载到内存中。

还有一个类似的未回答问题here

Answer 1

说明

使用不同的方法：流文件。

也就是说，不要将它完全加载到内存中，而是逐字节读取，同时逐字节写回。

获取文件的 InputStream，环绕一些 GZipInputStream。按字节读取字节，写入 OutputStream.

如果你想压缩存档则相反。然后你用一些 GZipOutputStream.

包裹 OutputStream

代码

该示例使用 Apache Commons Compress，但所有库的代码逻辑保持不变。

解压缩 gz 存档：

Path inputPath = Paths.get("archive.tar.gz");
Path outputPath = Paths.get("archive.tar");

try (InputStream fin = Files.newInputStream(inputPath );
        OutputStream out = Files.newOutputStream(outputPath);) {
    GZipCompressorInputStream in = new GZipCompressorInputStream(
        new BufferedInputStream(fin));

    // Read and write byte by byte
    final byte[] buffer = new byte[buffersize];
    int n = 0;
    while (-1 != (n = in.read(buffer))) {
        out.write(buffer, 0, n);
    }
}

打包为 gz 存档：

Path inputPath = Paths.get("archive.tar");
Path outputPath = Paths.get("archive.tar.gz");

try (InputStream in = Files.newInputStream(inputPath);
        OutputStream fout = Files.newOutputStream(outputPath);) {
    GZipCompressorOutputStream out = new GZipCompressorOutputStream(
        new BufferedOutputStream(fout));

    // Read and write byte by byte
    final byte[] buffer = new byte[buffersize];
    int n = 0;
    while (-1 != (n = in.read(buffer))) {
        out.write(buffer, 0, n);
    }
}

如果您觉得更舒服，也可以将 BufferedReader 和 PrintWriter 包裹起来。他们自己管理缓冲，您可以读写 lines 而不是 bytes。请注意，这仅在您读取包含行而不是其他格式的文件时才有效。

在 apache Camel 中压缩和解压缩大文件而不将整个文件加载到内存中

Zip and Unzip a large file without loading the entire file in memory in apache Camel

java

gzip

out-of-memory

apache-camel

说明

代码