获取联系人在一分钟内有 20 个活动的行 - SQL 查询

Get rows where contact has 20 activities within a minute - SQL query

我们正在为联系人和他们访问的每个页面收集一些分析数据。许多分析数据来自恶意攻击或机器人,因此它们在不到一分钟的时间内访问了网站的 20 多个页面。我希望能够每天清除一次此数据,但无法弄清楚如何编写一个 SQL 查询,该查询将 select 该联系人在一分钟内访问超过 20 页的所有行,而不仅仅是过去一分钟,但一整天。我将如何编写查询以获取在一分钟内有 20 多个活动的联系人的活动行?

分析 table 具有 DateCreated、ContactID、ActivityName、ActivityUrl

示例数据(假设一分钟内超过 3 个):

2020-07-25 23:59:58, 78, Page visit, /home  
2020-07-25 23:59:57, 78, Page visit, /home/1  
2020-07-25 23:59:58, 34, Page visit, /home/2  
2020-07-25 23:59:56, 78, Page visit, /home/3  
2020-07-25 23:59:55, 78, Page visit, /home/4  
2020-07-25 23:59:52, 34, Page visit, /home  
2020-07-25 23:59:52, 78, Page visit, /home/5   
2020-07-25 23:59:51, 34, Page visit, /home/5   
2020-07-25 23:59:50, 34, Page visit, /home/6        
2020-07-25 21:34:02, 764, Page visit, /home   
2020-07-25 22:11:01, 78, Page visit, /home/9    

所需数据:

2020-07-25 23:59:58, 78, Page visit, /home  
2020-07-25 23:59:57, 78, Page visit, /home/1  
2020-07-25 23:59:56, 78, Page visit, /home/3  
2020-07-25 23:59:55, 78, Page visit, /home/4   
2020-07-25 23:59:52, 78, Page visit, /home/5   
2020-07-25 23:59:58, 34, Page visit, /home/2  
2020-07-25 23:59:52, 34, Page visit, /home    
2020-07-25 23:59:51, 34, Page visit, /home/5  
2020-07-25 23:59:50, 34, Page visit, /home/6  

您可以使用两层 window 函数来完成此操作。第一级计算每 contactID 分钟的请求数,然后第二级计算每 contactID 天的第一次计算的最大计数。最后一步是过滤:

select *
from (
    select 
        t.*,
        max(cnt_minute) over(partition by ContactID, date(DateCreated)) max_cnt_minute
    from (
        select 
            t.*,
            count(*) over(partition by 
                ContactID,
                dateadd(minute, datediff(minute, 0, DateCreated), 0)
            ) cnt_minute
        from mytable t
    ) t
) t
where max_cnt_minute > 20

您可以使用可更新的 CTE 轻松地将其转换为 delete 语句(这似乎是您的实际意图):

with cte as (

    select 
        t.*,
        max(cnt_minute) over(partition by ContactID, date(DateCreated)) max_cnt_minute
    from (
        select 
            t.*,
            count(*) over(partition by 
                ContactID,
                dateadd(minute, datediff(minute, 0, DateCreated), 0)
            ) cnt_minute
        from mytable t
    ) t
)
delete from cte where max_cnt_minute > 20