如何通过查找每个列的最大时间戳来获取列的不同值,然后再获取其余列
How to get distinct values of a column by finding max timestamp for each and then get rest of the columns as well
我有一个很大的 oracle(Oracle Database 12c Enterprise Edition Release 12.1.0.2.0)table say table_name 每 15 秒更新一次。
它有很多专栏,但我关心的是:
Name Null? Type
--------------- -------- ---------------------------------
ID_1 NOT NULL NUMBER(38)
UTC_TIMESTAMP NOT NULL TIMESTAMP(6) WITH TIME ZONE
ID_2 VARCHAR2(8)
SERVER_NAME VARCHAR2(256)
ID_3 NUMBER(38)
COUNT_1 NUMBER(38)
COUNT_2 NUMBER(38)
我想做的是:
1) 获取所有 UTC_TIMESTAMP <= current_date 且 UTC_TIMESTAMP > current_date - 5 分钟(大约有 125K-150K)的记录
2) 此数据将重复 ID_1。所以我只想获取那些每个 ID_1 在其重复项中有 max(UTC_TIMESTAMP) 的记录。所以现在我们将有不同的 ID_1.
我试过的方法: 使用以下 SQL
with temp_1 as (
select m.ID_2, m.ID_1, max(utc_timestamp) max_utc_timestamp
from commsdesk.table_name m
where m.ID_2 = 'TWC'
group by m.ID_2, m.ID_1)
select f.utc_timestamp
from commsdesk.table_name f
join temp_1 t
on t.max_utc_timestamp = f.utc_timestamp
and t.ID_2 = f.ID_2
and t.ID_1 = f.ID_1;
问题: 我只能得到 ID_2、ID_1 和 UTC_TIMESTAMP,但我也想要所有其他列。
可以使用 SQL 来完成吗?
5 分钟内有大约 2200 条不同的 ID_1 和大约 125K-150K 条记录 window。
因此,通过复制 excel sheet 中的 125K-150K 条记录并过滤 2200 ID_1 中的每条记录以找到每个 ID_1 中的最大值 UTC_TIMESTAMP不切实际的。
但是如果有任何使用宏的快速方法,我也可以做到这一点。
示例虚拟数据:
ID_2 SERVER_NAME ID_3 ID_1 UTC_TIMESTAMP COUNT_1 COUNT_2
ABC PQRS.ABC.TPO 2 303 24-JUL-17 03.41.55.000000000 PM +00:00 4 0
ABC PQRS.ABC.TPO 2 1461 24-JUL-17 03.42.48.000000000 PM +00:00 1 7
ABC PQRS.ABC.TPO 2 1 24-JUL-17 03.41.36.000000000 PM +00:00 2 3
ABC PQRS.ABC.TPO 2 1461 24-JUL-17 03.41.16.000000000 PM +00:00 0 8
ABC PQRS.ABC.TPO 1 1 24-JUL-17 03.41.11.000000000 PM +00:00 5 0
ABC SRP.ROP.MTP 1 1 24-JUL-17 03.41.23.000000000 PM +00:00 0 0
ABC SRP.ROP.MTP 2 303 24-JUL-17 03.41.34.000000000 PM +00:00 0 0
ABC SRP.ROP.MTP 2 1461 24-JUL-17 03.41.31.000000000 PM +00:00 0 0
ABC SRP.ROP.MTP 4 303 24-JUL-17 03.41.26.000000000 PM +00:00 4 8
ABC SRP.ROP.MTP 2 303 24-JUL-17 03.41.20.000000000 PM +00:00 0 0
ABC SRP.ROP.MTP 1 1461 24-JUL-17 03.41.01.000000000 PM +00:00 3 8
ABC SRP.ROP.MTP 4 1 24-JUL-17 03.41.18.000000000 PM +00:00 9 1
预期输出:
ID_1 UTC_TIMESTAMP COUNT_1 COUNT_2
1 24-JUL-17 03.41.36.000000000 PM +00:00 2 3
303 24-JUL-17 03.41.55.000000000 PM +00:00 4 0
1461 24-JUL-17 03.42.48.000000000 PM +00:00 1 7
您可以使用 the keep (dense_rank last ...)
版本的 max()
聚合函数(或者,如果您愿意,可以使用 first
和 min
),例如:
select id_1,
max(utc_timestamp),
max(id_2) keep (dense_rank last order by utc_timestamp) as id_2,
max(server_name) keep (dense_rank last order by utc_timestamp) as server_name,
max(id_3) keep (dense_rank last order by utc_timestamp) as id_3,
max(count_1) keep (dense_rank last order by utc_timestamp) as count_1,
max(count_2) keep (dense_rank last order by utc_timestamp) as count_2
from table_name
where utc_timestamp > current_timestamp - interval '5' minute
and utc_timestamp <= current_timestamp
group by id_1
order by id_1;
查询按 id_1
分组,并且您需要最新的时间戳,max(utc_timestamp)
是 'normal'。对于 id_
.
,其他列保留与具有最大时间戳的行关联的值
有一些虚拟数据:
insert into table_name (id_1, utc_timestamp, id_2, server_name, id_3, count_1, count_2)
values (1, systimestamp at time zone 'UTC' - interval '30' second, 'TWC', 'test1', 301, 1, 1);
insert into table_name (id_1, utc_timestamp, id_2, server_name, id_3, count_1, count_2)
values (1, systimestamp at time zone 'UTC' - interval '60' second, 'TWC', 'test2', 302, 2, 2);
insert into table_name (id_1, utc_timestamp, id_2, server_name, id_3, count_1, count_2)
values (1, systimestamp at time zone 'UTC' - interval '90' second, 'TWC', 'test3', 303, 3, 3);
insert into table_name (id_1, utc_timestamp, id_2, server_name, id_3, count_1, count_2)
values (2, systimestamp at time zone 'UTC' - interval '45' second, 'TWC', 'test4', 304, 4, 4);
insert into table_name (id_1, utc_timestamp, id_2, server_name, id_3, count_1, count_2)
values (2, systimestamp at time zone 'UTC' - interval '15' second, 'TWC', 'test5', 305, 5, 5);
该查询得到结果:
ID_1 MAX(UTC_TIMESTAMP) ID_2 SERVE ID_3 COUNT_1 COUNT_2
---------- --------------------------- -------- ----- ---------- ---------- ----------
1 2017-07-21 18:38:22.944 UTC TWC test1 301 1 1
2 2017-07-21 18:38:38.399 UTC TWC test5 305 5 5
你可以通过更像你的尝试得到相同的结果:
with cte as (
select id_1, max(utc_timestamp) max_utc_timestamp
from table_name m
where utc_timestamp > current_timestamp - interval '5' minute
and utc_timestamp <= current_timestamp
group by id_1
)
select t.id_1, t.utc_timestamp, t.id_2, t.server_name, t.id_3, t.count_1, t.count_2
from cte
join table_name t on t.id_1 = cte.id_1
and t.utc_timestamp = cte.max_utc_timestamp
order by t.id_1;
...假设 id_1
和 utc_timestamp
组合是唯一的(不确定为什么要使用 id_2
进行连接;也许这是唯一性所必需的?)。但这会降低效率,因为它必须查询真实的 table 两次,一次是为每个 id_1
查找最大时间戳,然后在连接中再次查询。比较结果和时间以及执行计划可能值得 运行 两个版本。
使用您的示例数据(更新于 2017-07-24),上面的第一个查询 - 修改为使用固定时间戳范围进行匹配 - 获取:
ID_1 MAX(UTC_TIMESTAMP) ID_ SERVER_NAME ID_3 COUNT_1 COUNT_2
---------- --------------------------------- --- ------------ ---------- ---------- ----------
1 2017-07-24 15:41:36.000000 +00:00 ABC PQRS.ABC.TPO 2 2 3
303 2017-07-24 15:41:55.000000 +00:00 ABC PQRS.ABC.TPO 2 4 0
1461 2017-07-24 15:42:48.000000 +00:00 ABC PQRS.ABC.TPO 2 1 7
或删除您似乎不感兴趣的栏目:
select id_1,
max(utc_timestamp),
max(count_1) keep (dense_rank last order by utc_timestamp) as count_1,
max(count_2) keep (dense_rank last order by utc_timestamp) as count_2
from table_name
where utc_timestamp > timestamp '2017-07-24 16:40:00 Europe/London' -- current_timestamp - interval '5' minute
and utc_timestamp <= timestamp '2017-07-24 16:45:00 Europe/London' -- current_timestamp
group by id_1
order by id_1;
ID_1 MAX(UTC_TIMESTAMP) COUNT_1 COUNT_2
---------- --------------------------------- ---------- ----------
1 2017-07-24 15:41:36.000000 +00:00 2 3
303 2017-07-24 15:41:55.000000 +00:00 4 0
1461 2017-07-24 15:42:48.000000 +00:00 1 7
然后下一步:
select max(max_utc_timestamp) as max_utc_timestamp,
sum(count_1) as sum_count_1,
sum(count_2) as sum_count_2
from (
select max(utc_timestamp) as max_utc_timestamp,
max(count_1) keep (dense_rank last order by utc_timestamp) as count_1,
max(count_2) keep (dense_rank last order by utc_timestamp) as count_2
from table_name
where utc_timestamp > timestamp '2017-07-24 16:40:00 Europe/London' -- current_timestamp - interval '5' minute
and utc_timestamp <= timestamp '2017-07-24 16:45:00 Europe/London' -- current_timestamp
group by id_1
);
MAX_UTC_TIMESTAMP SUM_COUNT_1 SUM_COUNT_2
--------------------------------- ----------- -----------
2017-07-24 15:42:48.000000 +00:00 7 10
我有一个很大的 oracle(Oracle Database 12c Enterprise Edition Release 12.1.0.2.0)table say table_name 每 15 秒更新一次。 它有很多专栏,但我关心的是:
Name Null? Type
--------------- -------- ---------------------------------
ID_1 NOT NULL NUMBER(38)
UTC_TIMESTAMP NOT NULL TIMESTAMP(6) WITH TIME ZONE
ID_2 VARCHAR2(8)
SERVER_NAME VARCHAR2(256)
ID_3 NUMBER(38)
COUNT_1 NUMBER(38)
COUNT_2 NUMBER(38)
我想做的是:
1) 获取所有 UTC_TIMESTAMP <= current_date 且 UTC_TIMESTAMP > current_date - 5 分钟(大约有 125K-150K)的记录
2) 此数据将重复 ID_1。所以我只想获取那些每个 ID_1 在其重复项中有 max(UTC_TIMESTAMP) 的记录。所以现在我们将有不同的 ID_1.
我试过的方法: 使用以下 SQL
with temp_1 as (
select m.ID_2, m.ID_1, max(utc_timestamp) max_utc_timestamp
from commsdesk.table_name m
where m.ID_2 = 'TWC'
group by m.ID_2, m.ID_1)
select f.utc_timestamp
from commsdesk.table_name f
join temp_1 t
on t.max_utc_timestamp = f.utc_timestamp
and t.ID_2 = f.ID_2
and t.ID_1 = f.ID_1;
问题: 我只能得到 ID_2、ID_1 和 UTC_TIMESTAMP,但我也想要所有其他列。 可以使用 SQL 来完成吗?
5 分钟内有大约 2200 条不同的 ID_1 和大约 125K-150K 条记录 window。 因此,通过复制 excel sheet 中的 125K-150K 条记录并过滤 2200 ID_1 中的每条记录以找到每个 ID_1 中的最大值 UTC_TIMESTAMP不切实际的。 但是如果有任何使用宏的快速方法,我也可以做到这一点。
示例虚拟数据:
ID_2 SERVER_NAME ID_3 ID_1 UTC_TIMESTAMP COUNT_1 COUNT_2
ABC PQRS.ABC.TPO 2 303 24-JUL-17 03.41.55.000000000 PM +00:00 4 0
ABC PQRS.ABC.TPO 2 1461 24-JUL-17 03.42.48.000000000 PM +00:00 1 7
ABC PQRS.ABC.TPO 2 1 24-JUL-17 03.41.36.000000000 PM +00:00 2 3
ABC PQRS.ABC.TPO 2 1461 24-JUL-17 03.41.16.000000000 PM +00:00 0 8
ABC PQRS.ABC.TPO 1 1 24-JUL-17 03.41.11.000000000 PM +00:00 5 0
ABC SRP.ROP.MTP 1 1 24-JUL-17 03.41.23.000000000 PM +00:00 0 0
ABC SRP.ROP.MTP 2 303 24-JUL-17 03.41.34.000000000 PM +00:00 0 0
ABC SRP.ROP.MTP 2 1461 24-JUL-17 03.41.31.000000000 PM +00:00 0 0
ABC SRP.ROP.MTP 4 303 24-JUL-17 03.41.26.000000000 PM +00:00 4 8
ABC SRP.ROP.MTP 2 303 24-JUL-17 03.41.20.000000000 PM +00:00 0 0
ABC SRP.ROP.MTP 1 1461 24-JUL-17 03.41.01.000000000 PM +00:00 3 8
ABC SRP.ROP.MTP 4 1 24-JUL-17 03.41.18.000000000 PM +00:00 9 1
预期输出:
ID_1 UTC_TIMESTAMP COUNT_1 COUNT_2
1 24-JUL-17 03.41.36.000000000 PM +00:00 2 3
303 24-JUL-17 03.41.55.000000000 PM +00:00 4 0
1461 24-JUL-17 03.42.48.000000000 PM +00:00 1 7
您可以使用 the keep (dense_rank last ...)
版本的 max()
聚合函数(或者,如果您愿意,可以使用 first
和 min
),例如:
select id_1,
max(utc_timestamp),
max(id_2) keep (dense_rank last order by utc_timestamp) as id_2,
max(server_name) keep (dense_rank last order by utc_timestamp) as server_name,
max(id_3) keep (dense_rank last order by utc_timestamp) as id_3,
max(count_1) keep (dense_rank last order by utc_timestamp) as count_1,
max(count_2) keep (dense_rank last order by utc_timestamp) as count_2
from table_name
where utc_timestamp > current_timestamp - interval '5' minute
and utc_timestamp <= current_timestamp
group by id_1
order by id_1;
查询按 id_1
分组,并且您需要最新的时间戳,max(utc_timestamp)
是 'normal'。对于 id_
.
有一些虚拟数据:
insert into table_name (id_1, utc_timestamp, id_2, server_name, id_3, count_1, count_2)
values (1, systimestamp at time zone 'UTC' - interval '30' second, 'TWC', 'test1', 301, 1, 1);
insert into table_name (id_1, utc_timestamp, id_2, server_name, id_3, count_1, count_2)
values (1, systimestamp at time zone 'UTC' - interval '60' second, 'TWC', 'test2', 302, 2, 2);
insert into table_name (id_1, utc_timestamp, id_2, server_name, id_3, count_1, count_2)
values (1, systimestamp at time zone 'UTC' - interval '90' second, 'TWC', 'test3', 303, 3, 3);
insert into table_name (id_1, utc_timestamp, id_2, server_name, id_3, count_1, count_2)
values (2, systimestamp at time zone 'UTC' - interval '45' second, 'TWC', 'test4', 304, 4, 4);
insert into table_name (id_1, utc_timestamp, id_2, server_name, id_3, count_1, count_2)
values (2, systimestamp at time zone 'UTC' - interval '15' second, 'TWC', 'test5', 305, 5, 5);
该查询得到结果:
ID_1 MAX(UTC_TIMESTAMP) ID_2 SERVE ID_3 COUNT_1 COUNT_2
---------- --------------------------- -------- ----- ---------- ---------- ----------
1 2017-07-21 18:38:22.944 UTC TWC test1 301 1 1
2 2017-07-21 18:38:38.399 UTC TWC test5 305 5 5
你可以通过更像你的尝试得到相同的结果:
with cte as (
select id_1, max(utc_timestamp) max_utc_timestamp
from table_name m
where utc_timestamp > current_timestamp - interval '5' minute
and utc_timestamp <= current_timestamp
group by id_1
)
select t.id_1, t.utc_timestamp, t.id_2, t.server_name, t.id_3, t.count_1, t.count_2
from cte
join table_name t on t.id_1 = cte.id_1
and t.utc_timestamp = cte.max_utc_timestamp
order by t.id_1;
...假设 id_1
和 utc_timestamp
组合是唯一的(不确定为什么要使用 id_2
进行连接;也许这是唯一性所必需的?)。但这会降低效率,因为它必须查询真实的 table 两次,一次是为每个 id_1
查找最大时间戳,然后在连接中再次查询。比较结果和时间以及执行计划可能值得 运行 两个版本。
使用您的示例数据(更新于 2017-07-24),上面的第一个查询 - 修改为使用固定时间戳范围进行匹配 - 获取:
ID_1 MAX(UTC_TIMESTAMP) ID_ SERVER_NAME ID_3 COUNT_1 COUNT_2
---------- --------------------------------- --- ------------ ---------- ---------- ----------
1 2017-07-24 15:41:36.000000 +00:00 ABC PQRS.ABC.TPO 2 2 3
303 2017-07-24 15:41:55.000000 +00:00 ABC PQRS.ABC.TPO 2 4 0
1461 2017-07-24 15:42:48.000000 +00:00 ABC PQRS.ABC.TPO 2 1 7
或删除您似乎不感兴趣的栏目:
select id_1,
max(utc_timestamp),
max(count_1) keep (dense_rank last order by utc_timestamp) as count_1,
max(count_2) keep (dense_rank last order by utc_timestamp) as count_2
from table_name
where utc_timestamp > timestamp '2017-07-24 16:40:00 Europe/London' -- current_timestamp - interval '5' minute
and utc_timestamp <= timestamp '2017-07-24 16:45:00 Europe/London' -- current_timestamp
group by id_1
order by id_1;
ID_1 MAX(UTC_TIMESTAMP) COUNT_1 COUNT_2
---------- --------------------------------- ---------- ----------
1 2017-07-24 15:41:36.000000 +00:00 2 3
303 2017-07-24 15:41:55.000000 +00:00 4 0
1461 2017-07-24 15:42:48.000000 +00:00 1 7
然后下一步:
select max(max_utc_timestamp) as max_utc_timestamp,
sum(count_1) as sum_count_1,
sum(count_2) as sum_count_2
from (
select max(utc_timestamp) as max_utc_timestamp,
max(count_1) keep (dense_rank last order by utc_timestamp) as count_1,
max(count_2) keep (dense_rank last order by utc_timestamp) as count_2
from table_name
where utc_timestamp > timestamp '2017-07-24 16:40:00 Europe/London' -- current_timestamp - interval '5' minute
and utc_timestamp <= timestamp '2017-07-24 16:45:00 Europe/London' -- current_timestamp
group by id_1
);
MAX_UTC_TIMESTAMP SUM_COUNT_1 SUM_COUNT_2
--------------------------------- ----------- -----------
2017-07-24 15:42:48.000000 +00:00 7 10