SQL Select 内连接一一
SQL Select Inner join one by one
我对我的数据库(PostgreSQL v9.4.5)有一个特定的要求,但我没有看到任何优雅的纯 SQL 解决方案来解决它(我知道我可以用Python或者其他的,但是我有几十亿行数据,计算时间会大大增加)。
我有两个表:交易和事件。这些表都代表一天中订单簿中发生的交易(这就是为什么我有几十亿行,我的数据超过几年)但是 events 比 交易.
两个表都有 time、volume 和 quantity 列,但是每个表都有其他列具有特定信息的列(分别说 foo 和 bar)。
我想在 time、volume 和 price 列上的两个表之间建立对应关系,如我知道这种对应关系作为从交易到事件的注入存在(如果交易中有n行同样的时间t,同样的价格p,同样的成交量v,我知道有n 行 events 时间 t,价格 p 和音量 v).
交易:
id | time | price | volume | foo
-----+-----------+---------+--------+-------
201 | 32400.524 | 53 | 2085 | xxx
202 | 32400.530 | 53 | 1162 | xxx
203 | 32400.531 | 52.99 | 50 | xxx
204 | 32400.532 | 52.91 | 3119 | xxx
205 | 32400.837 | 52.91 | 3119 | xxx <--
206 | 32400.837 | 52.91 | 3119 | xxx <--
207 | 32400.837 | 52.91 | 3119 | xxx <--
208 | 32400.839 | 52.92 | 3220 | xxx <--
209 | 32400.839 | 52.92 | 3220 | xxx <--
210 | 32400.839 | 52.92 | 3220 | xxx <--
事件:
id | time | price | volume | bar
-----+-----------+---------+--------+------
328 | 32400.835 | 52.91 | 3119 | yyy
329 | 32400.837 | 52.91 | 3119 | yyy <--
330 | 32400.837 | 52.91 | 3119 | yyy <--
331 | 32400.837 | 52.91 | 3119 | yyy <--
332 | 32400.838 | 52.91 | 3119 | yyy
333 | 32400.838 | 52.91 | 3119 | yyy
334 | 32400.839 | 52.92 | 3220 | yyy <--
335 | 32400.839 | 52.92 | 3220 | yyy <--
336 | 32400.839 | 52.92 | 3220 | yyy <--
337 | 32400.840 | 52.91 | 2501 | yyy
我想要的是:
time | price | volume | bar | foo
-----------+---------+--------+------+-------
32400.837 | 52.91 | 3119 | xxx | yyy
32400.837 | 52.91 | 3119 | xxx | yyy
32400.837 | 52.91 | 3119 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
我不能进行经典的 INNER JOIN,否则我将在两个表之间进行所有可能的交叉(在这种情况下,我将有 6x6,然后是 36 行)。
虽然可以放几行,但只有一排对一排。
感谢您的帮助。
编辑:
正如我所说,如果我使用经典的 INNER JOIN,例如
SELECT * FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
我会有这样的东西:
trade_id | event_id | time | price | volume | bar | foo
---------+----------+-----------+---------+--------+------+-------
205 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
205 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
205 | 331 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 331 | 32400.837 | 52.91 | 3119 | xxx | yyy
207 | 329 | 32400.839 | 52.91 | 3119 | xxx | yyy
207 | 330 | 32400.839 | 52.91 | 3119 | xxx | yyy
207 | 331 | 32400.839 | 52.91 | 3119 | xxx | yyy
208 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
208 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
208 | 336 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 336 | 32400.837 | 52.92 | 3220 | xxx | yyy
210 | 334 | 32400.839 | 52.92 | 3220 | xxx | yyy
210 | 335 | 32400.839 | 52.92 | 3220 | xxx | yyy
210 | 336 | 32400.839 | 52.92 | 3220 | xxx | yyy
但我想要的是:
trade_id | event_id | time | price | volume | bar | foo
---------+----------+-----------+---------+--------+------+-------
205 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
207 | 331 | 32400.839 | 52.91 | 3119 | xxx | yyy
208 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
210 | 336 | 32400.839 | 52.92 | 3220 | xxx | yyy
检查此查询 -
SELECT Events.*,Trades.*
FROM Events
INNER JOIN Trades
ON Trades.time = Events.time
AND Trades.price = Events.price
AND Trades.volume = Events.volume
试试这个,如果成功请告诉我。我们也可以使用 row_number() over(partion by)
子句,但我不确定它是否适用于 postgreSQL。不管怎样,试试这个。
SELECT
min(t.id) as trade_id,min(e.id) as event_id,
min(t.time) as time,min(t.price) as price,
min(t.volume) as volume, min(e.bar) as bar,
min(t.foo) as foo
FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
group by t.id
看看您提供的示例数据,一个选项是:
SELECT e.id, min(t.id), e.time, e.price, e.volume, min(e.bar), min(t.foo) FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
GROUP BY e.id, e.time, e.price, e.volume
这是我的示例 row_number。
此外,SQL Fiddle:SO 33608351
with
trades AS
(
select 201 as id, 32400.524 as time, 53 as price, 2085 as volume, 'xxx' as foo union all
select 202, 32400.530, 53, 1162, 'xxx' union all
select 203, 32400.531, 52.99, 50, 'xxx' union all
select 204, 32400.532, 52.91, 3119, 'xxx' union all
select 205, 32400.837, 52.91, 3119, 'xxx' union all
select 206, 32400.837, 52.91, 3119, 'xxx' union all
select 207, 32400.837, 52.91, 3119, 'xxx' union all
select 208, 32400.839, 52.92, 3220, 'xxx' union all
select 209, 32400.839, 52.92, 3220, 'xxx' union all
select 210, 32400.839, 52.92, 3220, 'xxx'
),
events as
(
select 328 as id, 32400.835 as time , 52.91 as price , 3119 as volume , 'yyy' as bar union all
select 329 , 32400.837 , 52.91 , 3119 , 'yyy' union all
select 330 , 32400.837 , 52.91 , 3119 , 'yyy' union all
select 331 , 32400.837 , 52.91 , 3119 , 'yyy' union all
select 332 , 32400.838 , 52.91 , 3119 , 'yyy' union all
select 333 , 32400.838 , 52.91 , 3119 , 'yyy' union all
select 334 , 32400.839 , 52.92 , 3220 , 'yyy' union all
select 335 , 32400.839 , 52.92 , 3220 , 'yyy' union all
select 336 , 32400.839 , 52.92 , 3220 , 'yyy' union all
select 337 , 32400.840 , 52.91 , 2501 , 'yyy'
),
tradesWithRowNumber AS
(
select *
,ROW_NUMBER() over (PARTITION by time, price, volume order by time, price, volume) as RowNum
from trades
),
eventsWithRowNumber AS
(
select *
,ROW_NUMBER() over (PARTITION by time, price, volume order by time, price, volume) as RowNum
from events
)
select t.time,
t.price,
t.volume,
t.foo,
e.bar
FROM tradesWithRowNumber t
inner JOIN
eventsWithRowNumber e on e.time = t.time
AND e.price = t.price
AND e.volume = t.volume
and e.RowNum = t.RowNum
如果我理解正确,您只想列出 foo
和 bar
列而不创建笛卡尔积。为此,您可以使用 row_number()
引入一个新列并加入:
SELECT *
FROM (SELECT e.*,
ROW_NUMBER() OVER (PARTITION BY time, price, volume ORDER BY id) as seqnum
FROM events e
) e INNER JOIN
(SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY time, price, volume ORDER BY id) as FROM trades t
seqnum
) t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume AND
t.seqnum = e.seqnum;
您的问题不清楚您是想要内部联接、左外部联接还是完全外部联接。
我对我的数据库(PostgreSQL v9.4.5)有一个特定的要求,但我没有看到任何优雅的纯 SQL 解决方案来解决它(我知道我可以用Python或者其他的,但是我有几十亿行数据,计算时间会大大增加)。
我有两个表:交易和事件。这些表都代表一天中订单簿中发生的交易(这就是为什么我有几十亿行,我的数据超过几年)但是 events 比 交易.
两个表都有 time、volume 和 quantity 列,但是每个表都有其他列具有特定信息的列(分别说 foo 和 bar)。 我想在 time、volume 和 price 列上的两个表之间建立对应关系,如我知道这种对应关系作为从交易到事件的注入存在(如果交易中有n行同样的时间t,同样的价格p,同样的成交量v,我知道有n 行 events 时间 t,价格 p 和音量 v).
交易:
id | time | price | volume | foo
-----+-----------+---------+--------+-------
201 | 32400.524 | 53 | 2085 | xxx
202 | 32400.530 | 53 | 1162 | xxx
203 | 32400.531 | 52.99 | 50 | xxx
204 | 32400.532 | 52.91 | 3119 | xxx
205 | 32400.837 | 52.91 | 3119 | xxx <--
206 | 32400.837 | 52.91 | 3119 | xxx <--
207 | 32400.837 | 52.91 | 3119 | xxx <--
208 | 32400.839 | 52.92 | 3220 | xxx <--
209 | 32400.839 | 52.92 | 3220 | xxx <--
210 | 32400.839 | 52.92 | 3220 | xxx <--
事件:
id | time | price | volume | bar
-----+-----------+---------+--------+------
328 | 32400.835 | 52.91 | 3119 | yyy
329 | 32400.837 | 52.91 | 3119 | yyy <--
330 | 32400.837 | 52.91 | 3119 | yyy <--
331 | 32400.837 | 52.91 | 3119 | yyy <--
332 | 32400.838 | 52.91 | 3119 | yyy
333 | 32400.838 | 52.91 | 3119 | yyy
334 | 32400.839 | 52.92 | 3220 | yyy <--
335 | 32400.839 | 52.92 | 3220 | yyy <--
336 | 32400.839 | 52.92 | 3220 | yyy <--
337 | 32400.840 | 52.91 | 2501 | yyy
我想要的是:
time | price | volume | bar | foo
-----------+---------+--------+------+-------
32400.837 | 52.91 | 3119 | xxx | yyy
32400.837 | 52.91 | 3119 | xxx | yyy
32400.837 | 52.91 | 3119 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
我不能进行经典的 INNER JOIN,否则我将在两个表之间进行所有可能的交叉(在这种情况下,我将有 6x6,然后是 36 行)。
虽然可以放几行,但只有一排对一排。
感谢您的帮助。
编辑:
正如我所说,如果我使用经典的 INNER JOIN,例如
SELECT * FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
我会有这样的东西:
trade_id | event_id | time | price | volume | bar | foo
---------+----------+-----------+---------+--------+------+-------
205 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
205 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
205 | 331 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 331 | 32400.837 | 52.91 | 3119 | xxx | yyy
207 | 329 | 32400.839 | 52.91 | 3119 | xxx | yyy
207 | 330 | 32400.839 | 52.91 | 3119 | xxx | yyy
207 | 331 | 32400.839 | 52.91 | 3119 | xxx | yyy
208 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
208 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
208 | 336 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 336 | 32400.837 | 52.92 | 3220 | xxx | yyy
210 | 334 | 32400.839 | 52.92 | 3220 | xxx | yyy
210 | 335 | 32400.839 | 52.92 | 3220 | xxx | yyy
210 | 336 | 32400.839 | 52.92 | 3220 | xxx | yyy
但我想要的是:
trade_id | event_id | time | price | volume | bar | foo
---------+----------+-----------+---------+--------+------+-------
205 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
207 | 331 | 32400.839 | 52.91 | 3119 | xxx | yyy
208 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
210 | 336 | 32400.839 | 52.92 | 3220 | xxx | yyy
检查此查询 -
SELECT Events.*,Trades.*
FROM Events
INNER JOIN Trades
ON Trades.time = Events.time
AND Trades.price = Events.price
AND Trades.volume = Events.volume
试试这个,如果成功请告诉我。我们也可以使用 row_number() over(partion by)
子句,但我不确定它是否适用于 postgreSQL。不管怎样,试试这个。
SELECT
min(t.id) as trade_id,min(e.id) as event_id,
min(t.time) as time,min(t.price) as price,
min(t.volume) as volume, min(e.bar) as bar,
min(t.foo) as foo
FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
group by t.id
看看您提供的示例数据,一个选项是:
SELECT e.id, min(t.id), e.time, e.price, e.volume, min(e.bar), min(t.foo) FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
GROUP BY e.id, e.time, e.price, e.volume
这是我的示例 row_number。
此外,SQL Fiddle:SO 33608351
with
trades AS
(
select 201 as id, 32400.524 as time, 53 as price, 2085 as volume, 'xxx' as foo union all
select 202, 32400.530, 53, 1162, 'xxx' union all
select 203, 32400.531, 52.99, 50, 'xxx' union all
select 204, 32400.532, 52.91, 3119, 'xxx' union all
select 205, 32400.837, 52.91, 3119, 'xxx' union all
select 206, 32400.837, 52.91, 3119, 'xxx' union all
select 207, 32400.837, 52.91, 3119, 'xxx' union all
select 208, 32400.839, 52.92, 3220, 'xxx' union all
select 209, 32400.839, 52.92, 3220, 'xxx' union all
select 210, 32400.839, 52.92, 3220, 'xxx'
),
events as
(
select 328 as id, 32400.835 as time , 52.91 as price , 3119 as volume , 'yyy' as bar union all
select 329 , 32400.837 , 52.91 , 3119 , 'yyy' union all
select 330 , 32400.837 , 52.91 , 3119 , 'yyy' union all
select 331 , 32400.837 , 52.91 , 3119 , 'yyy' union all
select 332 , 32400.838 , 52.91 , 3119 , 'yyy' union all
select 333 , 32400.838 , 52.91 , 3119 , 'yyy' union all
select 334 , 32400.839 , 52.92 , 3220 , 'yyy' union all
select 335 , 32400.839 , 52.92 , 3220 , 'yyy' union all
select 336 , 32400.839 , 52.92 , 3220 , 'yyy' union all
select 337 , 32400.840 , 52.91 , 2501 , 'yyy'
),
tradesWithRowNumber AS
(
select *
,ROW_NUMBER() over (PARTITION by time, price, volume order by time, price, volume) as RowNum
from trades
),
eventsWithRowNumber AS
(
select *
,ROW_NUMBER() over (PARTITION by time, price, volume order by time, price, volume) as RowNum
from events
)
select t.time,
t.price,
t.volume,
t.foo,
e.bar
FROM tradesWithRowNumber t
inner JOIN
eventsWithRowNumber e on e.time = t.time
AND e.price = t.price
AND e.volume = t.volume
and e.RowNum = t.RowNum
如果我理解正确,您只想列出 foo
和 bar
列而不创建笛卡尔积。为此,您可以使用 row_number()
引入一个新列并加入:
SELECT *
FROM (SELECT e.*,
ROW_NUMBER() OVER (PARTITION BY time, price, volume ORDER BY id) as seqnum
FROM events e
) e INNER JOIN
(SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY time, price, volume ORDER BY id) as FROM trades t
seqnum
) t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume AND
t.seqnum = e.seqnum;
您的问题不清楚您是想要内部联接、左外部联接还是完全外部联接。