Python 是否有等效的 Parquet？

Is there a Parquet equivalent for Python?

我刚刚发现 Parquet 它满足了我的“大”数据处理/（本地）存储需求：

比关系数据库更快，关系数据库旨在运行通过网络（产生开销），只是不如为本地存储设计的解决方案快
与 JSON 或 CSV 相比：有利于将数据有效地存储到类型中（而不是所有内容都是字符串）并且可以比 JSON 或 CSV[= 更动态地从文件中读取特定块23=]

但令我沮丧的是，虽然 Node.js 有一个功能齐全的库，但 the only Parquet lib for Python 似乎完全是一个折衷办法：

parquet-python is a pure-python implementation (currently with only read-support) of the parquet format ... Not all parts of the parquet-format have been implemented yet or tested e.g. nested data

那么是什么原因呢？是否有比 Python 已经支持的 Parquet 更好的东西降低了开发支持它的库的兴趣？有一些接近的选择吗？

实际上，您可以使用 pandas 读写 parquet，这通常用于数据作业（不是大数据上的 ETL）。为了处理镶木地板 pandas 使用两个常用包：

pyarrow 是一个跨平台工具，为内存提供柱状格式。 Parquet也是一种柱状格式，它支持它，虽然它有多种格式并且它是一个更广泛的库。

fastparquet 专门用于专注于 parquet 格式，用于基于 python 的大数据流的处理。