获取联系人在一分钟内有 20 个活动的行 - SQL 查询
Get rows where contact has 20 activities within a minute - SQL query
我们正在为联系人和他们访问的每个页面收集一些分析数据。许多分析数据来自恶意攻击或机器人,因此它们在不到一分钟的时间内访问了网站的 20 多个页面。我希望能够每天清除一次此数据,但无法弄清楚如何编写一个 SQL 查询,该查询将 select 该联系人在一分钟内访问超过 20 页的所有行,而不仅仅是过去一分钟,但一整天。我将如何编写查询以获取在一分钟内有 20 多个活动的联系人的活动行?
分析 table 具有 DateCreated、ContactID、ActivityName、ActivityUrl
示例数据(假设一分钟内超过 3 个):
2020-07-25 23:59:58, 78, Page visit, /home
2020-07-25 23:59:57, 78, Page visit, /home/1
2020-07-25 23:59:58, 34, Page visit, /home/2
2020-07-25 23:59:56, 78, Page visit, /home/3
2020-07-25 23:59:55, 78, Page visit, /home/4
2020-07-25 23:59:52, 34, Page visit, /home
2020-07-25 23:59:52, 78, Page visit, /home/5
2020-07-25 23:59:51, 34, Page visit, /home/5
2020-07-25 23:59:50, 34, Page visit, /home/6
2020-07-25 21:34:02, 764, Page visit, /home
2020-07-25 22:11:01, 78, Page visit, /home/9
所需数据:
2020-07-25 23:59:58, 78, Page visit, /home
2020-07-25 23:59:57, 78, Page visit, /home/1
2020-07-25 23:59:56, 78, Page visit, /home/3
2020-07-25 23:59:55, 78, Page visit, /home/4
2020-07-25 23:59:52, 78, Page visit, /home/5
2020-07-25 23:59:58, 34, Page visit, /home/2
2020-07-25 23:59:52, 34, Page visit, /home
2020-07-25 23:59:51, 34, Page visit, /home/5
2020-07-25 23:59:50, 34, Page visit, /home/6
您可以使用两层 window 函数来完成此操作。第一级计算每 contactID
分钟的请求数,然后第二级计算每 contactID
天的第一次计算的最大计数。最后一步是过滤:
select *
from (
select
t.*,
max(cnt_minute) over(partition by ContactID, date(DateCreated)) max_cnt_minute
from (
select
t.*,
count(*) over(partition by
ContactID,
dateadd(minute, datediff(minute, 0, DateCreated), 0)
) cnt_minute
from mytable t
) t
) t
where max_cnt_minute > 20
您可以使用可更新的 CTE 轻松地将其转换为 delete
语句(这似乎是您的实际意图):
with cte as (
select
t.*,
max(cnt_minute) over(partition by ContactID, date(DateCreated)) max_cnt_minute
from (
select
t.*,
count(*) over(partition by
ContactID,
dateadd(minute, datediff(minute, 0, DateCreated), 0)
) cnt_minute
from mytable t
) t
)
delete from cte where max_cnt_minute > 20
我们正在为联系人和他们访问的每个页面收集一些分析数据。许多分析数据来自恶意攻击或机器人,因此它们在不到一分钟的时间内访问了网站的 20 多个页面。我希望能够每天清除一次此数据,但无法弄清楚如何编写一个 SQL 查询,该查询将 select 该联系人在一分钟内访问超过 20 页的所有行,而不仅仅是过去一分钟,但一整天。我将如何编写查询以获取在一分钟内有 20 多个活动的联系人的活动行?
分析 table 具有 DateCreated、ContactID、ActivityName、ActivityUrl
示例数据(假设一分钟内超过 3 个):
2020-07-25 23:59:58, 78, Page visit, /home
2020-07-25 23:59:57, 78, Page visit, /home/1
2020-07-25 23:59:58, 34, Page visit, /home/2
2020-07-25 23:59:56, 78, Page visit, /home/3
2020-07-25 23:59:55, 78, Page visit, /home/4
2020-07-25 23:59:52, 34, Page visit, /home
2020-07-25 23:59:52, 78, Page visit, /home/5
2020-07-25 23:59:51, 34, Page visit, /home/5
2020-07-25 23:59:50, 34, Page visit, /home/6
2020-07-25 21:34:02, 764, Page visit, /home
2020-07-25 22:11:01, 78, Page visit, /home/9
所需数据:
2020-07-25 23:59:58, 78, Page visit, /home
2020-07-25 23:59:57, 78, Page visit, /home/1
2020-07-25 23:59:56, 78, Page visit, /home/3
2020-07-25 23:59:55, 78, Page visit, /home/4
2020-07-25 23:59:52, 78, Page visit, /home/5
2020-07-25 23:59:58, 34, Page visit, /home/2
2020-07-25 23:59:52, 34, Page visit, /home
2020-07-25 23:59:51, 34, Page visit, /home/5
2020-07-25 23:59:50, 34, Page visit, /home/6
您可以使用两层 window 函数来完成此操作。第一级计算每 contactID
分钟的请求数,然后第二级计算每 contactID
天的第一次计算的最大计数。最后一步是过滤:
select *
from (
select
t.*,
max(cnt_minute) over(partition by ContactID, date(DateCreated)) max_cnt_minute
from (
select
t.*,
count(*) over(partition by
ContactID,
dateadd(minute, datediff(minute, 0, DateCreated), 0)
) cnt_minute
from mytable t
) t
) t
where max_cnt_minute > 20
您可以使用可更新的 CTE 轻松地将其转换为 delete
语句(这似乎是您的实际意图):
with cte as (
select
t.*,
max(cnt_minute) over(partition by ContactID, date(DateCreated)) max_cnt_minute
from (
select
t.*,
count(*) over(partition by
ContactID,
dateadd(minute, datediff(minute, 0, DateCreated), 0)
) cnt_minute
from mytable t
) t
)
delete from cte where max_cnt_minute > 20