单独压缩每一行还是压缩整个文件更好?

Is it better to compress every line individually or compress an entire file?

我的用例是我全天都在将条目写入文件。我可以将这些条目压缩,或者在事后压缩整个文件。这些文件可能会变得相当大(大约 10 GB 未压缩)并且我同时写入多个文件。其他一些考虑因素是我可以将文件拆分为更小的粒度,以解决每个文件压缩的​​缓冲区问题。对此可能没有明确的正确或错误答案,只是看看是否还有其他我应该考虑的因素。

压缩后,这些文件将被上传到某种存储介质以供存档和可能的日后分析。

每行压缩

Pros Cons
More space efficient while writing More Complicated to Implement
More space efficient while reading since I can decompress on a per entry granularity Less efficient in terms of disk space usage vs compressing an entire file

按文件压缩

Pros Cons
Better Compression on a per file basis since there is more data that can be compressed Requires a bigger buffer of disk space to handle writes throughout the day before compressing
Simpler to implement, write normally to file and compress afterwards using simple linux tools

除非你有非常非常长的行,否则单行几乎不会压缩。你试过了吗?

您可以通过累积行数来获得两全其美的效果,直到您有足够的数据来压缩,然后将它们写入文件。 gzlog 做到了。