具有特殊条件的内部连接

Inner join with special conditions

给定一个每小时 table 具有完整记录的 A,例如:

User    Hour  Purchase 
Joe      1       0
Joe      2       0
Joe      3       0
Joe      4       1
Joe      5       0
Joe      6       0
Joe      7       0
Joe      8       1
Joe      9       1
Joe     10       0 

以及它的子集 B,例如

User    Hour    Purchase 
Joe      3         0
Joe      9         1
Joe     10         0

我只想保留 A 中位于 B 中或最多比 B 子集晚 2 小时的记录,不重复,例如

User    Hour Purchase 
Joe      1    0
Joe      2    0
Joe      3    0
Joe      7    0
Joe      8    0
Joe      9    1 
Joe     10    0

如何通过内部联接实现结果,不重复(在本例中为第 8 和第 9 小时)并保留 B 小时的正确购买价值? (这是一个 MWE,假设有多个用户和时间戳而不是小时)

试试这个

with _data as 
(
select 'Joe' as _user, 1 as _hour,0 as purchase union all
select 'Joe' as _user, 2 as _hour,0 as purchase union all
select 'Joe' as _user, 3 as _hour,0 as purchase union all
select 'Joe' as _user, 4 as _hour,1 as purchase union all
select 'Joe' as _user, 5 as _hour,0 as purchase union all
select 'Joe' as _user, 6 as _hour,0 as purchase union all
select 'Joe' as _user, 7 as _hour,0 as purchase union all
select 'Joe' as _user, 8 as _hour,1 as purchase union all
select 'Joe' as _user, 9 as _hour,1 as purchase union all
select 'Joe' as _user,10 as _hour,0 as purchase
)
,subset as
(
select 'Joe' as _user, 3 as _hour,0 as purchase union all
select 'Joe' as _user, 9 as _hour,1 as purchase union all
select 'Joe' as _user,10 as _hour,0 as purchase
)
select  a._user,a._hour,any_value(b.purchase) from _data a  join subset b on 
(
    a._user = b._user and 
    (
        a._hour = b._hour 
        or 
        case when b._hour > a._hour then (case when (b._hour - a._hour) <=2 then 1=1 end) end)
)
group by a._user,a._hour

考虑以下简单方法

select * from tableA a
where exists (
  select 1 from tableB b
  where a.hour between b.hour - 2 and b.hour
  and a.user = b.user
)            

如果应用于您问题中的示例 - 输出为

我希望在你的实际情况下你有 datetimetimestamp 而不是 hour 列,所以在这种情况下你需要稍微修改上面的 where a.hour between b.hour - 2 and b.hour 部分.它看起来像

where a.datetime between datetime_sub(b.datetime, interval 2 hour) and b.datetime 

这是一个简单的 INNER 连接,在 ON 子句中具有适当的条件:

SELECT DISTINCT a.*
FROM a INNER JOIN b
ON b.User = a.User AND a.Hour BETWEEN b.Hour - 2 AND b.Hour

如果您想要特定用户的结果,您可以添加 WHERE 子句:

WHERE a.User = 'Joe'