在 Hive 中,如何 select 仅动态分区之一的值(当有一个或多个分区可用时)

In Hive, How to select only values of one of the dynamic partitions (when there are one or more partitions available)

现在我有一个 table 结构如下,

hive> desc clicks_fact;
    OK
    time                    timestamp                                   
    ..                              
    day                     date                                        
    file_date               varchar(8)                                  

    # Partition Information      
    # col_name              data_type               comment             

    day                     date                                        
    file_date               varchar(8)                                  
    Time taken: 1.075 seconds, Fetched: 28 row(s)

现在我想得到这个table的分区。

hive> show partitions clicks_fact;
OK
day=2016-09-02/file_date=20160902
..
day=2017-06-30/file_date=20170629
Time taken: 0.144 seconds, Fetched: 27 row(s)

我可以将分区作为两者的组合 day & file_date。 现在,有没有办法只获得 file_date

的值

Hive 提供的元数据检索选项非常有限。
直接查询 Metastore。

演示

蜂巢

create table clicks_fact (i int) partitioned by (day date,file_date int)
;

alter table clicks_fact add
    partition (day=date '2016-09-02',file_date=20160901)
    partition (day=date '2016-09-02',file_date=20160902)
    partition (day=date '2016-09-03',file_date=20160901)
    partition (day=date '2016-09-03',file_date=20160902)
    partition (day=date '2016-09-03',file_date=20160903)
;

Metastore (MySQL)

use metastore;


select  distinct
        pkv.PART_KEY_VAL
        
from            DBS                 as d

        join    TBLS                as t
        
        on      t.DB_ID =
                d.DB_ID

        join    PARTITION_KEYS      as pk
        
        on      pk.TBL_ID =
                t.TBL_ID

        join    PARTITIONS          as p
        
        on      p.TBL_ID =
                t.TBL_ID       

        join    PARTITION_KEY_VALS  as pkv
        
        on      pkv.PART_ID =
                p.PART_ID
                
            and pkv.INTEGER_IDX =
                pk.INTEGER_IDX       

where   d.NAME       = 'local_db'
    and t.TBL_NAME   = 'clicks_fact'
    and pk.PKEY_NAME = 'file_date'
;

+--------------+
| PART_KEY_VAL |
+--------------+
|     20160901 |
|     20160902 |
|     20160903 |
+--------------+