使用 SQL/Presto/Athena，按 ID 合并记录，将时间戳列顺序保持为最新

Question

是否可以根据最近的时间戳对具有相同 ID 的记录进行时间戳合并？

例如，假设一个 table 具有以下记录 user_id = 1.

user_id	姓名	地址	城市	状态	op_id	好处	phone	insert_date_timestamp
1					0			2021-06-22 15:06:29.083534
1							99999999	2021-06-22 15:06:29.153258
1						N		2021-06-22 15:03:29.153258
1					1			2021-06-22 15:01:29.153258
1		999 街	凤凰	阿兹				2021-06-22 14:06:29.153258
1	李四	母鹿						2021-06-21 15:06:29.153258

您可以看到超时插入了多个新条目，如果我将所有记录从旧记录合并到最新记录，则当前记录将是：

结果

user_id	姓名	地址	城市	状态	op_id	好处	phone	insert_date_timestamp
1	李四	999 街	凤凰	阿兹	0	N	99999999	2021-06-22 15:06:29.083534

如何使用 SQL 实现此目的？是否可以使用 PRESTO/Athena 查询生成相同的结果？

PS：我知道这可以使用 Pyspark、pandas 等来完成...我的用例是 Athena

谢谢！！

解决方案

select distinct user_id,
       first_value(name) ignore nulls over (partition by user_idorder by insert_date desc rows between unbounded preceding and unbounded following) as name,
       first_value(address) ignore nulls over (partition by user_id order by insert_date desc rows between unbounded preceding and unbounded following) as address,
       . . .
from t;

Answer 1

您可以使用 first_value():

select distinct user_id,
       first_value(name) over (partition by user_id, name is not null desc order by insert_date desc rows between unbounded preceding and unbounded following) as name,
       first_value(address) over (partition by user_id, address is not null desc order by insert_date desc rows between unbounded preceding and unbounded following) as address,
       . . .
from t;

或者，如果您更喜欢 ignore nulls：

select distinct user_id,
       first_value(name) ignore nulls over (partition by user_idorder by insert_date desc rows between unbounded preceding and unbounded following) as name,
       first_value(address) ignore nulls over (partition by user_id order by insert_date desc rows between unbounded preceding and unbounded following) as address,
       . . .
from t;

使用 SQL/Presto/Athena，按 ID 合并记录，将时间戳列顺序保持为最新

Using SQL/Presto/Athena, Merge records by ID maintaining the timestamp column order as the most recent

sql

presto

amazon-athena