如何在一天内使用所有时间戳 (86400) 填充 Hive Table

Question

我想要一个具有 4 列的 Hive Table <date,key,Timestamp,count>。这里的键可以是多个，计数应该是数字 0。对于每个键，我需要一天中每一秒的记录。例如。我有 2 个键 A 和 B。我想要 table 中每个键的 86400 条记录。来自 00:00:00 till 23:59:59

我知道 current_timestamp 函数。不确定它是否适合这里。

Date, Key, Timestamp, Count
2019-05-31, A, 00:00:00, 0
2019-05-31, A, 00:00:01, 0
2019-05-31, A, 00:00:02, 0
.
.
.
2019-05-31, A, 23:59:59, 0
2019-05-31, B, 00:00:00, 0
2019-05-31, B, 00:00:01, 0
2019-05-31, B, 00:00:02, 0
.
.
.
2019-05-31, B, 23:59:59, 0

Answer 1

此查询将生成所需的时间戳：

 select from_unixtime(unix_timestamp('2019-05-31 00:00:00')+i) as ts 
   from (select 
               posexplode(split(space(86399),' ')) as (i,x)
        )s

解释：

子查询 s 生成具有秒数的行。如果你加入这样的子查询，例如使用交叉连接（这取决于你的初始数据集），你将得到每行重复 86400 次

space(86399) - 产生 86399 个空格的字符串 split() - 产生数组空间 posexplode - 分解数组并生成 table 的位置和元素。 Position(i) 在 0 - 86399 范围内，我们将使用它作为秒数添加到开始时间戳

unix_timestamp('2019-05-31 00:00:00') - 这给出了从 unix 时代开始的以秒为单位的开始时间戳。我们正在向它添加秒数 (i) 并再次转换为时间戳，这样我们就可以为每一行获得 1 秒递增的时间戳。

加入它，如果您需要单独的日期和时间，请使用子字符串。

演示：

例如，您的初始 table 包含两行带有时间戳和两个键 A 和 B，您可以将其与生成子查询的秒数连接：

with your_table as( --This is initial data example
select stack(2,
'2019-05-31 00:00:00', 'A', 
'2019-05-31 00:00:00', 'B'
) as (ts, Key)
)

select min(ts), max(ts), key --aggregated result for the demo
from
(
select from_unixtime(unix_timestamp(t.ts)+i) as ts , t.key
  from your_table t
      cross join (select posexplode(split(space(86399),' ')) as (i,x))s
)s group by key

结果（添加了聚合，因为它生成了太多行）：

min                 max                 key
2019-05-31 00:00:00 2019-05-31 23:59:59 B
2019-05-31 00:00:00 2019-05-31 23:59:59 A

如何在一天内使用所有时间戳 (86400) 填充 Hive Table

How to populate Hive Table with all timestamps(86400) in a day

hadoop

timestamp

hive

hdfs

hiveql