如何实现镶木地板格式的架构更改

Question

我们面临的只是一个设计问题。

我有一个镶木地板格式的外部配置单元 table，其中包含以下列：

describe payments_user
col_name,data_type,comment
('amount_hold', 'int', '')
('id', 'int', '')
('transaction_id', 'string', '')
('recipient_id', 'string', '')
('year', 'string', '')
('month', 'string', '')
('day', 'string', '')
('', None, None)
('# Partition Information', None, None)
('# col_name            ', 'data_type           ', 'comment             ')
('', None, None)
('year', 'string', '')
('month', 'string', '')
('day', 'string', '')

我们每天获取数据，我们将这些数据动态地摄取到年、月和日的分区中。因此，如果源端的数据要在他们添加新列并发送批处理文件的地方更改，我们如何摄取数据。我知道 avro 具有此功能，但为了减少返工，如何以镶木地板格式实现？

如果是avro，程序是什么？

Answer 1

您正在寻找的是模式演化，与 AVRO 相比，Hive 支持它但有一些限制。

Schema evolution in parquet format

如何实现镶木地板格式的架构更改

How to achieve change of schema in parquet format

hive

avro

hiveql

impala

parquet