如何在 C++ 中使用 libbz2 库解压缩内存缓冲区中的 pbzip2 数据

How to decompress pbzip2 data in memory buffer by using libbz2 library in C++

我有一个解压缩 bzip2 数据的工作版本,我称之为 bz2_bzdecompress API。它是这样的

while (bytes_input < len) {
    isDone = false;

    // Initialize the input buffer and its length
    size_t in_buffer_size = len -bytes_input;
    the_bz2_stream.avail_in = in_buffer_size;
    the_bz2_stream.next_in = (char*)data +bytes_input;

    size_t out_buffer_size =
        output_size -bytes_uncompressed;  // size of output buffer
    if (out_buffer_size == 0) {  // out of space in the output buffer
      break;
    }

    the_bz2_stream.avail_out = out_buffer_size;
    the_bz2_stream.next_out =
        (char*)output +bytes_uncompressed;  // output buffer

    ret = BZ2_bzDecompress(&the_bz2_stream);
    if (ret != BZ_OK && ret != BZ_STREAM_END) {
      throw Bzip2Exception("Bzip2 failed. ", ret);
    }

   bytes_input += in_buffer_size - the_bz2_stream.avail_in;
   bytes_uncompressed += out_buffer_size - the_bz2_stream.avail_out;

    *data_consumed =bytes_input;

    if (ret == BZ_STREAM_END) {
      ret = BZ2_bzDecompressEnd(&the_bz2_stream);
      if (ret != BZ_OK) {
        throw Bzip2Exception("Bzip2 fail. ", ret);
      }
      isDone = true;
    }
  }

这对本机 bzip2 压缩文件非常有效,但对于 pbzip2(并行 Bzip2)和 "Splittable" bzip2 数据,它抛出 "BZ_PARAM_ERROR".

我在他们的文档中看到 pbzip2 是这样说的-

Data compressed with pbzip2 is broken into multiple streams and each stream is bzip2 compressed looking like this: [-----|-----|-----|-----|-----|-----|-----|-----|-----]

If you are writing software with libbzip2 to decompress data created with pbzip2, you must take into account that the data contains multiple bzip2 streams so you will encounter end-of-stream markers from libbzip2 after each stream and must look-ahead to see if there are any more streams to process before quitting. The bzip2 program itself will automatically handle this condition.

来源:http://compression.ca/pbzip2/

有人可以告诉我如何处理吗?我应该使用其他一些 libzip2 API 吗?

此外,pbzip2 文件与正常的 "bunzip2" 命令兼容。当我的代码抛出 BZ_PARAM_ERROR?

时,bzip2 如何优雅地处理这个问题

谢谢。

在你的 BZ2_bzDecompressEnd() 之后你需要再次调用 BZ2_bzDecompressInit()(你必须在那个循环之前调用它),如果还有数据需要解压,即 bytes_input < len

要解压缩每个 |-----| 块,您需要执行一次 init、一些 decompress 次调用和一次 end。所以如果你还有剩余的输入,那么你需要再做一个 init, n*decompress, end.

确保你做最后的 end,以避免大的内存泄漏。

你得到一个 BZ_PARAM_ERROR 因为你正试图使用​​一个未初始化的 bz_stream 来解压缩。一旦你做了 BZ2_bzDecompressEnd(),你就不能再使用那个 bz_stream,除非你对它做 BZ2_bzDecompressInit()