SQL Select 内连接一一

SQL Select Inner join one by one

我对我的数据库(PostgreSQL v9.4.5)有一个特定的要求,但我没有看到任何优雅的纯 SQL 解决方案来解决它(我知道我可以用Python或者其他的,但是我有几十亿行数据,计算时间会大大增加)。

我有两个表:交易事件。这些表都代表一天中订单簿中发生的交易(这就是为什么我有几十亿行,我的数据超过几年)但是 events交易.

两个表都有 timevolumequantity 列,但是每个表都有其他列具有特定信息的列(分别说 foobar)。 我想在 timevolumeprice 列上的两个表之间建立对应关系,如我知道这种对应关系作为从交易到事件的注入存在(如果交易中有n行同样的时间t,同样的价格p,同样的成交量v,我知道有nevents 时间 t,价格 p 和音量 v).

交易:

  id |   time    |  price  | volume |   foo
-----+-----------+---------+--------+-------
 201 | 32400.524 |      53 |   2085 |   xxx
 202 | 32400.530 |      53 |   1162 |   xxx
 203 | 32400.531 |   52.99 |     50 |   xxx
 204 | 32400.532 |   52.91 |   3119 |   xxx
 205 | 32400.837 |   52.91 |   3119 |   xxx <--
 206 | 32400.837 |   52.91 |   3119 |   xxx <--
 207 | 32400.837 |   52.91 |   3119 |   xxx <--
 208 | 32400.839 |   52.92 |   3220 |   xxx <--
 209 | 32400.839 |   52.92 |   3220 |   xxx <--
 210 | 32400.839 |   52.92 |   3220 |   xxx <--

事件:

  id |   time    |  price  | volume |  bar 
-----+-----------+---------+--------+------
 328 | 32400.835 |   52.91 |   3119 |  yyy
 329 | 32400.837 |   52.91 |   3119 |  yyy <--
 330 | 32400.837 |   52.91 |   3119 |  yyy <--
 331 | 32400.837 |   52.91 |   3119 |  yyy <--
 332 | 32400.838 |   52.91 |   3119 |  yyy
 333 | 32400.838 |   52.91 |   3119 |  yyy
 334 | 32400.839 |   52.92 |   3220 |  yyy <--
 335 | 32400.839 |   52.92 |   3220 |  yyy <--
 336 | 32400.839 |   52.92 |   3220 |  yyy <--
 337 | 32400.840 |   52.91 |   2501 |  yyy

我想要的是:

   time    |  price  | volume |  bar |   foo 
-----------+---------+--------+------+-------
 32400.837 |   52.91 |   3119 |  xxx |   yyy
 32400.837 |   52.91 |   3119 |  xxx |   yyy
 32400.837 |   52.91 |   3119 |  xxx |   yyy
 32400.839 |   52.92 |   3220 |  xxx |   yyy
 32400.839 |   52.92 |   3220 |  xxx |   yyy
 32400.839 |   52.92 |   3220 |  xxx |   yyy

我不能进行经典的 INNER JOIN,否则我将在两个表之间进行所有可能的交叉(在这种情况下,我将有 6x6,然后是 36 行)。

虽然可以放几行,但只有一排对一排。

感谢您的帮助。

编辑:

正如我所说,如果我使用经典的 INNER JOIN,例如

SELECT * FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume

我会有这样的东西:

trade_id | event_id |   time    |  price  | volume |  bar |   foo 
---------+----------+-----------+---------+--------+------+-------
  205    |   329    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  205    |   330    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  205    |   331    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  206    |   329    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  206    |   330    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  206    |   331    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  207    |   329    | 32400.839 |   52.91 |   3119 |  xxx |   yyy
  207    |   330    | 32400.839 |   52.91 |   3119 |  xxx |   yyy
  207    |   331    | 32400.839 |   52.91 |   3119 |  xxx |   yyy
  208    |   334    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  208    |   335    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  208    |   336    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  209    |   334    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  209    |   335    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  209    |   336    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  210    |   334    | 32400.839 |   52.92 |   3220 |  xxx |   yyy
  210    |   335    | 32400.839 |   52.92 |   3220 |  xxx |   yyy
  210    |   336    | 32400.839 |   52.92 |   3220 |  xxx |   yyy

但我想要的是:

trade_id | event_id |   time    |  price  | volume |  bar |   foo 
---------+----------+-----------+---------+--------+------+-------
  205    |   329    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  206    |   330    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  207    |   331    | 32400.839 |   52.91 |   3119 |  xxx |   yyy
  208    |   334    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  209    |   335    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  210    |   336    | 32400.839 |   52.92 |   3220 |  xxx |   yyy

检查此查询 -

SELECT Events.*,Trades.*
FROM Events
INNER JOIN Trades
ON Trades.time = Events.time
AND Trades.price = Events.price
AND Trades.volume = Events.volume

试试这个,如果成功请告诉我。我们也可以使用 row_number() over(partion by) 子句,但我不确定它是否适用于 postgreSQL。不管怎样,试试这个。

SELECT 
  min(t.id) as trade_id,min(e.id) as event_id,
  min(t.time) as time,min(t.price) as price,
  min(t.volume) as volume,  min(e.bar) as bar,
  min(t.foo) as foo 
FROM events e,
  INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
group by t.id

看看您提供的示例数据,一个选项是:

SELECT e.id, min(t.id), e.time, e.price, e.volume, min(e.bar), min(t.foo)  FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
GROUP BY e.id, e.time, e.price, e.volume

这是我的示例 row_number。

此外,SQL Fiddle:SO 33608351

with 
trades AS
(
    select 201 as id, 32400.524 as time, 53 as price,       2085 as volume, 'xxx' as foo union all
    select 202, 32400.530, 53,      1162,   'xxx' union all
    select 203, 32400.531, 52.99,       50,     'xxx' union all
    select 204, 32400.532, 52.91,       3119,   'xxx' union all
    select 205, 32400.837, 52.91,       3119,   'xxx' union all
    select 206, 32400.837, 52.91,       3119,   'xxx' union all
    select 207, 32400.837, 52.91,       3119,   'xxx' union all
    select 208, 32400.839, 52.92,       3220,   'xxx' union all
    select 209, 32400.839, 52.92,       3220,   'xxx' union all
    select 210, 32400.839, 52.92,       3220,   'xxx'
),
events as
(
    select 328 as id, 32400.835 as time ,   52.91 as price ,   3119 as volume ,  'yyy' as bar union all
    select 329 , 32400.837 ,   52.91 ,   3119 ,  'yyy' union all
    select 330 , 32400.837 ,   52.91 ,   3119 ,  'yyy' union all
    select 331 , 32400.837 ,   52.91 ,   3119 ,  'yyy' union all
    select 332 , 32400.838 ,   52.91 ,   3119 ,  'yyy' union all
    select 333 , 32400.838 ,   52.91 ,   3119 ,  'yyy' union all
    select 334 , 32400.839 ,   52.92 ,   3220 ,  'yyy' union all
    select 335 , 32400.839 ,   52.92 ,   3220 ,  'yyy' union all
    select 336 , 32400.839 ,   52.92 ,   3220 ,  'yyy' union all
    select 337 , 32400.840 ,   52.91 ,   2501 ,  'yyy'
),
tradesWithRowNumber AS
(
    select   *
            ,ROW_NUMBER() over (PARTITION by time, price, volume order by time, price, volume) as RowNum
    from trades
),
eventsWithRowNumber AS
(
    select   *
            ,ROW_NUMBER() over (PARTITION by time, price, volume order by time, price, volume) as RowNum
    from events
)
select  t.time,
        t.price,
        t.volume,
        t.foo,
        e.bar
FROM    tradesWithRowNumber t
        inner JOIN
        eventsWithRowNumber e   on  e.time = t.time
                                AND e.price = t.price
                                AND e.volume = t.volume
                                and e.RowNum = t.RowNum

如果我理解正确,您只想列出 foobar 列而不创建笛卡尔积。为此,您可以使用 row_number() 引入一个新列并加入:

SELECT *
FROM (SELECT e.*,
             ROW_NUMBER() OVER (PARTITION BY time, price, volume ORDER BY id) as seqnum
      FROM events e
     ) e INNER JOIN
     (SELECT t.*,
             ROW_NUMBER() OVER (PARTITION BY time, price, volume ORDER BY id) as       FROM trades t
seqnum
     ) t
     ON t.time = e.time AND t.price = e.price AND t.volume = e.volume AND
        t.seqnum = e.seqnum;

您的问题不清楚您是想要内部联接、左外部联接还是完全外部联接。