无法在 Azure Databricks 中使用 TIMESTAMP 数据类型创建 Hive table
Not able to create Hive table with TIMESTAMP datatype in Azure Databricks
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.UnsupportedOperationException: Parquet does not support
timestamp. See HIVE-6384;
在 Azure Databricks 中执行以下代码时出现上述错误。
spark_session.sql("""
CREATE EXTERNAL TABLE IF NOT EXISTS dev_db.processing_table
(
campaign STRING,
status STRING,
file_name STRING,
arrival_time TIMESTAMP
)
PARTITIONED BY (
Date DATE)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION "/mnt/data_analysis/pre-processed/"
""")
根据 Hive-6384 Jira,从 Hive-1.2 开始,您可以在 parquet 中使用 Timestamp,date
类型表。
Hive < 1.2 版本的解决方法:
1.使用字符串类型:
CREATE EXTERNAL TABLE IF NOT EXISTS dev_db.processing_table
(
campaign STRING,
status STRING,
file_name STRING,
arrival_time STRING
)
PARTITIONED BY (
Date STRING)
Stored as parquet
Location '/mnt/data_analysis/pre-processed/';
然后在处理时您可以将 arrival_time
、Date
转换为 timestamp
、date
类型。
使用 view
并转换列,但 views are slow.
2. Using ORC format:
CREATE EXTERNAL TABLE IF NOT EXISTS dev_db.processing_table
(
campaign STRING,
status STRING,
file_name STRING,
arrival_time Timestamp
)
PARTITIONED BY (
Date date)
Stored as orc
Location '/mnt/data_analysis/pre-processed/';
ORC同时支持timestamp
,date
类型
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Parquet does not support timestamp. See HIVE-6384;
在 Azure Databricks 中执行以下代码时出现上述错误。
spark_session.sql("""
CREATE EXTERNAL TABLE IF NOT EXISTS dev_db.processing_table
(
campaign STRING,
status STRING,
file_name STRING,
arrival_time TIMESTAMP
)
PARTITIONED BY (
Date DATE)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION "/mnt/data_analysis/pre-processed/"
""")
根据 Hive-6384 Jira,从 Hive-1.2 开始,您可以在 parquet 中使用 Timestamp,date
类型表。
Hive < 1.2 版本的解决方法:
1.使用字符串类型:
CREATE EXTERNAL TABLE IF NOT EXISTS dev_db.processing_table
(
campaign STRING,
status STRING,
file_name STRING,
arrival_time STRING
)
PARTITIONED BY (
Date STRING)
Stored as parquet
Location '/mnt/data_analysis/pre-processed/';
然后在处理时您可以将 arrival_time
、Date
转换为 timestamp
、date
类型。
使用 view
并转换列,但 views are slow.
2. Using ORC format:
CREATE EXTERNAL TABLE IF NOT EXISTS dev_db.processing_table
(
campaign STRING,
status STRING,
file_name STRING,
arrival_time Timestamp
)
PARTITIONED BY (
Date date)
Stored as orc
Location '/mnt/data_analysis/pre-processed/';
ORC同时支持timestamp
,date
类型