具有特殊条件的内部连接
Inner join with special conditions
给定一个每小时 table 具有完整记录的 A,例如:
User Hour Purchase
Joe 1 0
Joe 2 0
Joe 3 0
Joe 4 1
Joe 5 0
Joe 6 0
Joe 7 0
Joe 8 1
Joe 9 1
Joe 10 0
以及它的子集 B,例如
User Hour Purchase
Joe 3 0
Joe 9 1
Joe 10 0
我只想保留 A 中位于 B 中或最多比 B 子集晚 2 小时的记录,不重复,例如
User Hour Purchase
Joe 1 0
Joe 2 0
Joe 3 0
Joe 7 0
Joe 8 0
Joe 9 1
Joe 10 0
如何通过内部联接实现结果,不重复(在本例中为第 8 和第 9 小时)并保留 B 小时的正确购买价值? (这是一个 MWE,假设有多个用户和时间戳而不是小时)
试试这个
with _data as
(
select 'Joe' as _user, 1 as _hour,0 as purchase union all
select 'Joe' as _user, 2 as _hour,0 as purchase union all
select 'Joe' as _user, 3 as _hour,0 as purchase union all
select 'Joe' as _user, 4 as _hour,1 as purchase union all
select 'Joe' as _user, 5 as _hour,0 as purchase union all
select 'Joe' as _user, 6 as _hour,0 as purchase union all
select 'Joe' as _user, 7 as _hour,0 as purchase union all
select 'Joe' as _user, 8 as _hour,1 as purchase union all
select 'Joe' as _user, 9 as _hour,1 as purchase union all
select 'Joe' as _user,10 as _hour,0 as purchase
)
,subset as
(
select 'Joe' as _user, 3 as _hour,0 as purchase union all
select 'Joe' as _user, 9 as _hour,1 as purchase union all
select 'Joe' as _user,10 as _hour,0 as purchase
)
select a._user,a._hour,any_value(b.purchase) from _data a join subset b on
(
a._user = b._user and
(
a._hour = b._hour
or
case when b._hour > a._hour then (case when (b._hour - a._hour) <=2 then 1=1 end) end)
)
group by a._user,a._hour
考虑以下简单方法
select * from tableA a
where exists (
select 1 from tableB b
where a.hour between b.hour - 2 and b.hour
and a.user = b.user
)
如果应用于您问题中的示例 - 输出为
我希望在你的实际情况下你有 datetime
或 timestamp
而不是 hour
列,所以在这种情况下你需要稍微修改上面的 where a.hour between b.hour - 2 and b.hour
部分.它看起来像
where a.datetime between datetime_sub(b.datetime, interval 2 hour) and b.datetime
这是一个简单的 INNER
连接,在 ON
子句中具有适当的条件:
SELECT DISTINCT a.*
FROM a INNER JOIN b
ON b.User = a.User AND a.Hour BETWEEN b.Hour - 2 AND b.Hour
如果您想要特定用户的结果,您可以添加 WHERE
子句:
WHERE a.User = 'Joe'
给定一个每小时 table 具有完整记录的 A,例如:
User Hour Purchase
Joe 1 0
Joe 2 0
Joe 3 0
Joe 4 1
Joe 5 0
Joe 6 0
Joe 7 0
Joe 8 1
Joe 9 1
Joe 10 0
以及它的子集 B,例如
User Hour Purchase
Joe 3 0
Joe 9 1
Joe 10 0
我只想保留 A 中位于 B 中或最多比 B 子集晚 2 小时的记录,不重复,例如
User Hour Purchase
Joe 1 0
Joe 2 0
Joe 3 0
Joe 7 0
Joe 8 0
Joe 9 1
Joe 10 0
如何通过内部联接实现结果,不重复(在本例中为第 8 和第 9 小时)并保留 B 小时的正确购买价值? (这是一个 MWE,假设有多个用户和时间戳而不是小时)
试试这个
with _data as
(
select 'Joe' as _user, 1 as _hour,0 as purchase union all
select 'Joe' as _user, 2 as _hour,0 as purchase union all
select 'Joe' as _user, 3 as _hour,0 as purchase union all
select 'Joe' as _user, 4 as _hour,1 as purchase union all
select 'Joe' as _user, 5 as _hour,0 as purchase union all
select 'Joe' as _user, 6 as _hour,0 as purchase union all
select 'Joe' as _user, 7 as _hour,0 as purchase union all
select 'Joe' as _user, 8 as _hour,1 as purchase union all
select 'Joe' as _user, 9 as _hour,1 as purchase union all
select 'Joe' as _user,10 as _hour,0 as purchase
)
,subset as
(
select 'Joe' as _user, 3 as _hour,0 as purchase union all
select 'Joe' as _user, 9 as _hour,1 as purchase union all
select 'Joe' as _user,10 as _hour,0 as purchase
)
select a._user,a._hour,any_value(b.purchase) from _data a join subset b on
(
a._user = b._user and
(
a._hour = b._hour
or
case when b._hour > a._hour then (case when (b._hour - a._hour) <=2 then 1=1 end) end)
)
group by a._user,a._hour
考虑以下简单方法
select * from tableA a
where exists (
select 1 from tableB b
where a.hour between b.hour - 2 and b.hour
and a.user = b.user
)
如果应用于您问题中的示例 - 输出为
我希望在你的实际情况下你有 datetime
或 timestamp
而不是 hour
列,所以在这种情况下你需要稍微修改上面的 where a.hour between b.hour - 2 and b.hour
部分.它看起来像
where a.datetime between datetime_sub(b.datetime, interval 2 hour) and b.datetime
这是一个简单的 INNER
连接,在 ON
子句中具有适当的条件:
SELECT DISTINCT a.*
FROM a INNER JOIN b
ON b.User = a.User AND a.Hour BETWEEN b.Hour - 2 AND b.Hour
如果您想要特定用户的结果,您可以添加 WHERE
子句:
WHERE a.User = 'Joe'