SQL 插入缺失日期

SQL interpolate missing dates

使用 SQL Server 2012 我有一个 table 叫做 Allbucket

CustodianAccountNum symbol  EndDate ManagerName MarketValue NetReturn
A9G040819   wabix   12/31/2013  GMO Benchmark   34751.10987 0.004072
A9G040819   wabix   1/31/2014   GMO Benchmark   34128.88767 -0.017905
A9G040819   wabix   2/28/2014   GMO Benchmark   49969.8081  0.0202
A9G040819   wabix   3/31/2014   GMO Benchmark   50370.993   0.008028
A9G040819   wabix   4/30/2014   GMO Benchmark   50995.0584  0.012389
A9G040819   amj 12/31/2013  JPMorgan Alerian    1234.55 -0.008154
A9G040819   amj 2/28/2014   JPMorgan Alerian    14849.76    -0.018599
A9G040819   amj 3/31/2014   JPMorgan Alerian    14892.8 0.015203
A9G040819   amj 4/30/2014   JPMorgan Alerian    15513.6 0.041684

我正在尝试将此数据从一个系统加载到另一个系统。但是,它要求每个给定的 CustodianAccountNum,所有符号在它们都存在的时间段内具有相同的日期间隔。

请注意,2014 年 1 月 31 日缺少 amj。线索是至少有一种其他证券,在本例中为 wabix,在同一时间跨度内有该日期。另请注意,有时日期是月内,例如 1/15/2014

我希望做一些类似于自连接和分区的事情,在其中我为给定的 CustodianAccountNum 获取所有可能的不同日期,然后强制所有行在它们重叠的时间跨度内具有相同的周期性。对于非原始插值行和 'borrowed' 来自该时间跨度内存在的另一个符号,我想从该符号的前一行中提取滞后市场价值(如果前一行存在,则为 0)和将所有其他值强制为零。原始数据中还有其他列,但我试图让这个示例保持简单。

所以理想情况下 AMJ 应该是这样的,因为 wabix 的日期是 1/31/2014

CustodianAccountNum symbol  EndDate ManagerName MarketValue NetReturn
   A9G040819    amj 12/31/2013  JPMorgan Alerian    1234.55 -0.008154
A9G040819       amj 1/31/2014   JPMorgan Alerian    1234.55 -0.0
    A9G040819   amj 2/28/2014   JPMorgan Alerian    14849.76    -0.018599
    A9G040819   amj 3/31/2014   JPMorgan Alerian    14892.8 0.015203
    A9G040819   amj 4/30/2014   JPMorgan Alerian    15513.6 0.041684

缺失日期的指导原则是任何其他符号的日期是否由给定的托管账户划分。有数千个不同的帐号,但它们只需要按照给定的帐号对齐

我只关心每个帐户的符号生命周期内的日期间隔。如果另一个符号名称早于它存在多年,我不需要添加很多个月的 0。我只需要它们在时间重叠的所有符号中从给定符号的第一个日期到最后一个日期同步。

更新

Gordon Linoff 的回复让我很接近,但还不够。我必须将 OUTER APPLY 更改为 CROSS apply,否则我会在所有列中收到数千条空记录。

我修改了查询以显示所有需要的列,但是此查询导致除市场价值之外的所有列都为 = 0。基本上我想将派生行的所有值强制为 0 (1/31/ 2014 在我的例子中)除了我想从以前的市场价值中提取的市场价值。但是,对于所有非派生行,我想在整行中使用原始值。

select 

ab.drank,d.EndDate,ab.BranchName,ab.EntityID,ab.CustodianAccountNum,ab.AccountID,ab.ManagerName,
ab.FTAssetStyle,ab.FTAssetClass,ab.PWMSecurityID,ab.AssetClassCode,ab.AssetClass,ab.Symbol,ab.SecType,
ab.Cusip,ab.Held,ab.MarketValue,
0 AS GrossFlow,0 AS GrossWeight,0 AS GrossReturn,0 AS NetFlow,0 AS NetWeight,
0 AS NetReturn,0 AS PortfolioFees,0 AS PortfolioExpenses,0 AS ManagerFees,0 AS Income

from (select distinct CustodianAccountNum, enddate from Allbucket) d join
   (select distinct CustodianAccountNum, symbol from Allbucket) s
   on d.CustodianAccountNum = s.CustodianAccountNum CROSS apply
   (select top 1 ab.*
   from Allbucket ab
   where d.CustodianAccountNum = ab.CustodianAccountNum and
      d.enddate <= ab.enddate and
      s.symbol = ab.symbol
            AND ab.CustodianAccountNum = 'A9G040819'
   order by d.enddate desc
   ) ab

您基本上可以使用 cross join 来生成行。在这种情况下,它实际上是 CustodianAccountNum 的不同日期和符号的连接,但它仍然是笛卡尔积。

然后,CustodianAccountNumsymbolEndDate组合的最新记录可以使用outer apply选择。

以下是一个细微的变化。这使用 left join 来引入匹配记录,然后在没有匹配时使用来自两个记录的信息。我不确定哪些列应该为 0,但想法是:

select ab.CustodianAccountNum, ab.symbol, d.EndDate, ab.ManagerName,
       ab.MarketValue, 0 as NetReturn,
       ab.xxx,                      -- for columns that come from the current row
       coalesce(ab.yyy, abprev.yyy) -- for columns from the previous row
from (select distinct CustodianAccountNum, enddate from Allbucket) d join
     (select distinct CustodianAccountNum, symbol from Allbucket) s
     on d.CustodianAccountNum = s.CustodianAccountNum left join
     Allbucket ab
     on d.CustodianAccountNum = ab.CustodianAccountNum and
        d.enddate <= ab.enddate and
        s.symbol = ab.symbol outer apply
     (seleect top 1 ab.*
      from Allbucket ab
      where d.CustodianAccountNum = ab.CustodianAccountNum and
            d.enddate < ab.enddate and
            s.symbol = ab.symbol
      order by d.enddate desc
     ) abprev

一种略有不同的方法,但仍然使用笛卡尔积和 APPLY 运算符(此方法需要 OUTER APPLY)。要在不希望先前值结转的地方获得 0,只需相应地修改 COALESCE() 即可。

SQL Fiddle

MS SQL Server 2014 架构设置:

CREATE TABLE Allbucket
    ([CustodianAccountNum] varchar(9), [symbol] varchar(5), [EndDate] datetime, [ManagerName] varchar(16), [MarketValue] numeric
     , [NetReturn] decimal(12,6))
;

INSERT INTO Allbucket
    ([CustodianAccountNum], [symbol], [EndDate], [ManagerName], [MarketValue], [NetReturn])
VALUES
    ('A9G040819', 'wabix', '2013-12-31 00:00:00', 'GMO Benchmark', 34751.10987, 0.004072),
    ('A9G040819', 'wabix', '2014-01-31 00:00:00', 'GMO Benchmark', 34128.88767, -0.017905),
    ('A9G040819', 'wabix', '2014-02-28 00:00:00', 'GMO Benchmark', 49969.8081, 0.0202),
    ('A9G040819', 'wabix', '2014-03-31 00:00:00', 'GMO Benchmark', 50370.993, 0.008028),
    ('A9G040819', 'wabix', '2014-04-30 00:00:00', 'GMO Benchmark', 50995.0584, 0.012389),
    ('A9G040819', 'amj', '2013-12-31 00:00:00', 'JPMorgan Alerian', 1234.55, -0.008154),
    ('A9G040819', 'amj', '2014-02-28 00:00:00', 'JPMorgan Alerian', 14849.76, -0.018599),
    ('A9G040819', 'amj', '2014-03-31 00:00:00', 'JPMorgan Alerian', 14892.8, 0.015203),
    ('A9G040819', 'amj', '2014-04-30 00:00:00', 'JPMorgan Alerian', 15513.6, 0.041684)
;

查询 1:

SELECT
      s.CustodianAccountNum
    , s.symbol
    , d.enddate
    , COALESCE(ab.ManagerName, ap.ManagerName) AS ManagerName
    , COALESCE(ab.MarketValue, ap.MarketValue) AS MarketValue
    , COALESCE(ab.NetReturn, 0) AS NetReturn
FROM (
      SELECT
            CustodianAccountNum
          , symbol
          , MIN(enddate) symstart
          , MAX(enddate) symend
      FROM Allbucket
      GROUP BY
            CustodianAccountNum
          , symbol
      ) s
      JOIN (
            SELECT DISTINCT
                  cast(enddate as date) as enddate
            FROM Allbucket
      ) d ON d.enddate BETWEEN s.symstart AND s.symend
      LEFT JOIN Allbucket ab ON s.CustodianAccountNum = ab.CustodianAccountNum
                  AND s.symbol = ab.symbol
                  AND ab.enddate = d.enddate
      OUTER APPLY (
            SELECT TOP 1
                  t.*
            FROM Allbucket t
            WHERE s.CustodianAccountNum = t.CustodianAccountNum
                  AND s.symbol = t.symbol
                  AND d.enddate <= t.enddate
            ORDER BY
                  d.enddate DESC
      ) ap

Results:

| CustodianAccountNum | symbol |    enddate |      ManagerName | MarketValue | NetReturn |
|---------------------|--------|------------|------------------|-------------|-----------|
|           A9G040819 |    amj | 2013-12-31 | JPMorgan Alerian |        1235 | -0.008154 |
|           A9G040819 |    amj | 2014-01-31 | JPMorgan Alerian |       14850 |         0 |
|           A9G040819 |    amj | 2014-02-28 | JPMorgan Alerian |       14850 | -0.018599 |
|           A9G040819 |    amj | 2014-03-31 | JPMorgan Alerian |       14893 |  0.015203 |
|           A9G040819 |    amj | 2014-04-30 | JPMorgan Alerian |       15514 |  0.041684 |
|           A9G040819 |  wabix | 2013-12-31 |    GMO Benchmark |       34751 |  0.004072 |
|           A9G040819 |  wabix | 2014-01-31 |    GMO Benchmark |       34129 | -0.017905 |
|           A9G040819 |  wabix | 2014-02-28 |    GMO Benchmark |       49970 |    0.0202 |
|           A9G040819 |  wabix | 2014-03-31 |    GMO Benchmark |       50371 |  0.008028 |
|           A9G040819 |  wabix | 2014-04-30 |    GMO Benchmark |       50995 |  0.012389 |

nb:您可以使用 ISNULL() 而不是 COALESCE()

[EDITS] 对 NetValue 上的数据类型进行了更正,并在结束日期进行了更改,但这是可选的