SQL 插入缺失日期
SQL interpolate missing dates
使用 SQL Server 2012 我有一个 table 叫做 Allbucket
CustodianAccountNum symbol EndDate ManagerName MarketValue NetReturn
A9G040819 wabix 12/31/2013 GMO Benchmark 34751.10987 0.004072
A9G040819 wabix 1/31/2014 GMO Benchmark 34128.88767 -0.017905
A9G040819 wabix 2/28/2014 GMO Benchmark 49969.8081 0.0202
A9G040819 wabix 3/31/2014 GMO Benchmark 50370.993 0.008028
A9G040819 wabix 4/30/2014 GMO Benchmark 50995.0584 0.012389
A9G040819 amj 12/31/2013 JPMorgan Alerian 1234.55 -0.008154
A9G040819 amj 2/28/2014 JPMorgan Alerian 14849.76 -0.018599
A9G040819 amj 3/31/2014 JPMorgan Alerian 14892.8 0.015203
A9G040819 amj 4/30/2014 JPMorgan Alerian 15513.6 0.041684
我正在尝试将此数据从一个系统加载到另一个系统。但是,它要求每个给定的 CustodianAccountNum,所有符号在它们都存在的时间段内具有相同的日期间隔。
请注意,2014 年 1 月 31 日缺少 amj。线索是至少有一种其他证券,在本例中为 wabix,在同一时间跨度内有该日期。另请注意,有时日期是月内,例如 1/15/2014
我希望做一些类似于自连接和分区的事情,在其中我为给定的 CustodianAccountNum 获取所有可能的不同日期,然后强制所有行在它们重叠的时间跨度内具有相同的周期性。对于非原始插值行和 'borrowed' 来自该时间跨度内存在的另一个符号,我想从该符号的前一行中提取滞后市场价值(如果前一行存在,则为 0)和将所有其他值强制为零。原始数据中还有其他列,但我试图让这个示例保持简单。
所以理想情况下 AMJ 应该是这样的,因为 wabix 的日期是 1/31/2014
CustodianAccountNum symbol EndDate ManagerName MarketValue NetReturn
A9G040819 amj 12/31/2013 JPMorgan Alerian 1234.55 -0.008154
A9G040819 amj 1/31/2014 JPMorgan Alerian 1234.55 -0.0
A9G040819 amj 2/28/2014 JPMorgan Alerian 14849.76 -0.018599
A9G040819 amj 3/31/2014 JPMorgan Alerian 14892.8 0.015203
A9G040819 amj 4/30/2014 JPMorgan Alerian 15513.6 0.041684
缺失日期的指导原则是任何其他符号的日期是否由给定的托管账户划分。有数千个不同的帐号,但它们只需要按照给定的帐号对齐
我只关心每个帐户的符号生命周期内的日期间隔。如果另一个符号名称早于它存在多年,我不需要添加很多个月的 0。我只需要它们在时间重叠的所有符号中从给定符号的第一个日期到最后一个日期同步。
更新
Gordon Linoff 的回复让我很接近,但还不够。我必须将 OUTER APPLY 更改为 CROSS apply,否则我会在所有列中收到数千条空记录。
我修改了查询以显示所有需要的列,但是此查询导致除市场价值之外的所有列都为 = 0。基本上我想将派生行的所有值强制为 0 (1/31/ 2014 在我的例子中)除了我想从以前的市场价值中提取的市场价值。但是,对于所有非派生行,我想在整行中使用原始值。
select
ab.drank,d.EndDate,ab.BranchName,ab.EntityID,ab.CustodianAccountNum,ab.AccountID,ab.ManagerName,
ab.FTAssetStyle,ab.FTAssetClass,ab.PWMSecurityID,ab.AssetClassCode,ab.AssetClass,ab.Symbol,ab.SecType,
ab.Cusip,ab.Held,ab.MarketValue,
0 AS GrossFlow,0 AS GrossWeight,0 AS GrossReturn,0 AS NetFlow,0 AS NetWeight,
0 AS NetReturn,0 AS PortfolioFees,0 AS PortfolioExpenses,0 AS ManagerFees,0 AS Income
from (select distinct CustodianAccountNum, enddate from Allbucket) d join
(select distinct CustodianAccountNum, symbol from Allbucket) s
on d.CustodianAccountNum = s.CustodianAccountNum CROSS apply
(select top 1 ab.*
from Allbucket ab
where d.CustodianAccountNum = ab.CustodianAccountNum and
d.enddate <= ab.enddate and
s.symbol = ab.symbol
AND ab.CustodianAccountNum = 'A9G040819'
order by d.enddate desc
) ab
您基本上可以使用 cross join
来生成行。在这种情况下,它实际上是 CustodianAccountNum
的不同日期和符号的连接,但它仍然是笛卡尔积。
然后,CustodianAccountNum
、symbol
和EndDate
组合的最新记录可以使用outer apply
选择。
以下是一个细微的变化。这使用 left join
来引入匹配记录,然后在没有匹配时使用来自两个记录的信息。我不确定哪些列应该为 0,但想法是:
select ab.CustodianAccountNum, ab.symbol, d.EndDate, ab.ManagerName,
ab.MarketValue, 0 as NetReturn,
ab.xxx, -- for columns that come from the current row
coalesce(ab.yyy, abprev.yyy) -- for columns from the previous row
from (select distinct CustodianAccountNum, enddate from Allbucket) d join
(select distinct CustodianAccountNum, symbol from Allbucket) s
on d.CustodianAccountNum = s.CustodianAccountNum left join
Allbucket ab
on d.CustodianAccountNum = ab.CustodianAccountNum and
d.enddate <= ab.enddate and
s.symbol = ab.symbol outer apply
(seleect top 1 ab.*
from Allbucket ab
where d.CustodianAccountNum = ab.CustodianAccountNum and
d.enddate < ab.enddate and
s.symbol = ab.symbol
order by d.enddate desc
) abprev
一种略有不同的方法,但仍然使用笛卡尔积和 APPLY 运算符(此方法需要 OUTER APPLY)。要在不希望先前值结转的地方获得 0,只需相应地修改 COALESCE() 即可。
MS SQL Server 2014 架构设置:
CREATE TABLE Allbucket
([CustodianAccountNum] varchar(9), [symbol] varchar(5), [EndDate] datetime, [ManagerName] varchar(16), [MarketValue] numeric
, [NetReturn] decimal(12,6))
;
INSERT INTO Allbucket
([CustodianAccountNum], [symbol], [EndDate], [ManagerName], [MarketValue], [NetReturn])
VALUES
('A9G040819', 'wabix', '2013-12-31 00:00:00', 'GMO Benchmark', 34751.10987, 0.004072),
('A9G040819', 'wabix', '2014-01-31 00:00:00', 'GMO Benchmark', 34128.88767, -0.017905),
('A9G040819', 'wabix', '2014-02-28 00:00:00', 'GMO Benchmark', 49969.8081, 0.0202),
('A9G040819', 'wabix', '2014-03-31 00:00:00', 'GMO Benchmark', 50370.993, 0.008028),
('A9G040819', 'wabix', '2014-04-30 00:00:00', 'GMO Benchmark', 50995.0584, 0.012389),
('A9G040819', 'amj', '2013-12-31 00:00:00', 'JPMorgan Alerian', 1234.55, -0.008154),
('A9G040819', 'amj', '2014-02-28 00:00:00', 'JPMorgan Alerian', 14849.76, -0.018599),
('A9G040819', 'amj', '2014-03-31 00:00:00', 'JPMorgan Alerian', 14892.8, 0.015203),
('A9G040819', 'amj', '2014-04-30 00:00:00', 'JPMorgan Alerian', 15513.6, 0.041684)
;
查询 1:
SELECT
s.CustodianAccountNum
, s.symbol
, d.enddate
, COALESCE(ab.ManagerName, ap.ManagerName) AS ManagerName
, COALESCE(ab.MarketValue, ap.MarketValue) AS MarketValue
, COALESCE(ab.NetReturn, 0) AS NetReturn
FROM (
SELECT
CustodianAccountNum
, symbol
, MIN(enddate) symstart
, MAX(enddate) symend
FROM Allbucket
GROUP BY
CustodianAccountNum
, symbol
) s
JOIN (
SELECT DISTINCT
cast(enddate as date) as enddate
FROM Allbucket
) d ON d.enddate BETWEEN s.symstart AND s.symend
LEFT JOIN Allbucket ab ON s.CustodianAccountNum = ab.CustodianAccountNum
AND s.symbol = ab.symbol
AND ab.enddate = d.enddate
OUTER APPLY (
SELECT TOP 1
t.*
FROM Allbucket t
WHERE s.CustodianAccountNum = t.CustodianAccountNum
AND s.symbol = t.symbol
AND d.enddate <= t.enddate
ORDER BY
d.enddate DESC
) ap
| CustodianAccountNum | symbol | enddate | ManagerName | MarketValue | NetReturn |
|---------------------|--------|------------|------------------|-------------|-----------|
| A9G040819 | amj | 2013-12-31 | JPMorgan Alerian | 1235 | -0.008154 |
| A9G040819 | amj | 2014-01-31 | JPMorgan Alerian | 14850 | 0 |
| A9G040819 | amj | 2014-02-28 | JPMorgan Alerian | 14850 | -0.018599 |
| A9G040819 | amj | 2014-03-31 | JPMorgan Alerian | 14893 | 0.015203 |
| A9G040819 | amj | 2014-04-30 | JPMorgan Alerian | 15514 | 0.041684 |
| A9G040819 | wabix | 2013-12-31 | GMO Benchmark | 34751 | 0.004072 |
| A9G040819 | wabix | 2014-01-31 | GMO Benchmark | 34129 | -0.017905 |
| A9G040819 | wabix | 2014-02-28 | GMO Benchmark | 49970 | 0.0202 |
| A9G040819 | wabix | 2014-03-31 | GMO Benchmark | 50371 | 0.008028 |
| A9G040819 | wabix | 2014-04-30 | GMO Benchmark | 50995 | 0.012389 |
nb:您可以使用 ISNULL() 而不是 COALESCE()
[EDITS] 对 NetValue 上的数据类型进行了更正,并在结束日期进行了更改,但这是可选的
使用 SQL Server 2012 我有一个 table 叫做 Allbucket
CustodianAccountNum symbol EndDate ManagerName MarketValue NetReturn
A9G040819 wabix 12/31/2013 GMO Benchmark 34751.10987 0.004072
A9G040819 wabix 1/31/2014 GMO Benchmark 34128.88767 -0.017905
A9G040819 wabix 2/28/2014 GMO Benchmark 49969.8081 0.0202
A9G040819 wabix 3/31/2014 GMO Benchmark 50370.993 0.008028
A9G040819 wabix 4/30/2014 GMO Benchmark 50995.0584 0.012389
A9G040819 amj 12/31/2013 JPMorgan Alerian 1234.55 -0.008154
A9G040819 amj 2/28/2014 JPMorgan Alerian 14849.76 -0.018599
A9G040819 amj 3/31/2014 JPMorgan Alerian 14892.8 0.015203
A9G040819 amj 4/30/2014 JPMorgan Alerian 15513.6 0.041684
我正在尝试将此数据从一个系统加载到另一个系统。但是,它要求每个给定的 CustodianAccountNum,所有符号在它们都存在的时间段内具有相同的日期间隔。
请注意,2014 年 1 月 31 日缺少 amj。线索是至少有一种其他证券,在本例中为 wabix,在同一时间跨度内有该日期。另请注意,有时日期是月内,例如 1/15/2014
我希望做一些类似于自连接和分区的事情,在其中我为给定的 CustodianAccountNum 获取所有可能的不同日期,然后强制所有行在它们重叠的时间跨度内具有相同的周期性。对于非原始插值行和 'borrowed' 来自该时间跨度内存在的另一个符号,我想从该符号的前一行中提取滞后市场价值(如果前一行存在,则为 0)和将所有其他值强制为零。原始数据中还有其他列,但我试图让这个示例保持简单。
所以理想情况下 AMJ 应该是这样的,因为 wabix 的日期是 1/31/2014
CustodianAccountNum symbol EndDate ManagerName MarketValue NetReturn
A9G040819 amj 12/31/2013 JPMorgan Alerian 1234.55 -0.008154
A9G040819 amj 1/31/2014 JPMorgan Alerian 1234.55 -0.0
A9G040819 amj 2/28/2014 JPMorgan Alerian 14849.76 -0.018599
A9G040819 amj 3/31/2014 JPMorgan Alerian 14892.8 0.015203
A9G040819 amj 4/30/2014 JPMorgan Alerian 15513.6 0.041684
缺失日期的指导原则是任何其他符号的日期是否由给定的托管账户划分。有数千个不同的帐号,但它们只需要按照给定的帐号对齐
我只关心每个帐户的符号生命周期内的日期间隔。如果另一个符号名称早于它存在多年,我不需要添加很多个月的 0。我只需要它们在时间重叠的所有符号中从给定符号的第一个日期到最后一个日期同步。
更新
Gordon Linoff 的回复让我很接近,但还不够。我必须将 OUTER APPLY 更改为 CROSS apply,否则我会在所有列中收到数千条空记录。
我修改了查询以显示所有需要的列,但是此查询导致除市场价值之外的所有列都为 = 0。基本上我想将派生行的所有值强制为 0 (1/31/ 2014 在我的例子中)除了我想从以前的市场价值中提取的市场价值。但是,对于所有非派生行,我想在整行中使用原始值。
select
ab.drank,d.EndDate,ab.BranchName,ab.EntityID,ab.CustodianAccountNum,ab.AccountID,ab.ManagerName,
ab.FTAssetStyle,ab.FTAssetClass,ab.PWMSecurityID,ab.AssetClassCode,ab.AssetClass,ab.Symbol,ab.SecType,
ab.Cusip,ab.Held,ab.MarketValue,
0 AS GrossFlow,0 AS GrossWeight,0 AS GrossReturn,0 AS NetFlow,0 AS NetWeight,
0 AS NetReturn,0 AS PortfolioFees,0 AS PortfolioExpenses,0 AS ManagerFees,0 AS Income
from (select distinct CustodianAccountNum, enddate from Allbucket) d join
(select distinct CustodianAccountNum, symbol from Allbucket) s
on d.CustodianAccountNum = s.CustodianAccountNum CROSS apply
(select top 1 ab.*
from Allbucket ab
where d.CustodianAccountNum = ab.CustodianAccountNum and
d.enddate <= ab.enddate and
s.symbol = ab.symbol
AND ab.CustodianAccountNum = 'A9G040819'
order by d.enddate desc
) ab
您基本上可以使用 cross join
来生成行。在这种情况下,它实际上是 CustodianAccountNum
的不同日期和符号的连接,但它仍然是笛卡尔积。
然后,CustodianAccountNum
、symbol
和EndDate
组合的最新记录可以使用outer apply
选择。
以下是一个细微的变化。这使用 left join
来引入匹配记录,然后在没有匹配时使用来自两个记录的信息。我不确定哪些列应该为 0,但想法是:
select ab.CustodianAccountNum, ab.symbol, d.EndDate, ab.ManagerName,
ab.MarketValue, 0 as NetReturn,
ab.xxx, -- for columns that come from the current row
coalesce(ab.yyy, abprev.yyy) -- for columns from the previous row
from (select distinct CustodianAccountNum, enddate from Allbucket) d join
(select distinct CustodianAccountNum, symbol from Allbucket) s
on d.CustodianAccountNum = s.CustodianAccountNum left join
Allbucket ab
on d.CustodianAccountNum = ab.CustodianAccountNum and
d.enddate <= ab.enddate and
s.symbol = ab.symbol outer apply
(seleect top 1 ab.*
from Allbucket ab
where d.CustodianAccountNum = ab.CustodianAccountNum and
d.enddate < ab.enddate and
s.symbol = ab.symbol
order by d.enddate desc
) abprev
一种略有不同的方法,但仍然使用笛卡尔积和 APPLY 运算符(此方法需要 OUTER APPLY)。要在不希望先前值结转的地方获得 0,只需相应地修改 COALESCE() 即可。
MS SQL Server 2014 架构设置:
CREATE TABLE Allbucket
([CustodianAccountNum] varchar(9), [symbol] varchar(5), [EndDate] datetime, [ManagerName] varchar(16), [MarketValue] numeric
, [NetReturn] decimal(12,6))
;
INSERT INTO Allbucket
([CustodianAccountNum], [symbol], [EndDate], [ManagerName], [MarketValue], [NetReturn])
VALUES
('A9G040819', 'wabix', '2013-12-31 00:00:00', 'GMO Benchmark', 34751.10987, 0.004072),
('A9G040819', 'wabix', '2014-01-31 00:00:00', 'GMO Benchmark', 34128.88767, -0.017905),
('A9G040819', 'wabix', '2014-02-28 00:00:00', 'GMO Benchmark', 49969.8081, 0.0202),
('A9G040819', 'wabix', '2014-03-31 00:00:00', 'GMO Benchmark', 50370.993, 0.008028),
('A9G040819', 'wabix', '2014-04-30 00:00:00', 'GMO Benchmark', 50995.0584, 0.012389),
('A9G040819', 'amj', '2013-12-31 00:00:00', 'JPMorgan Alerian', 1234.55, -0.008154),
('A9G040819', 'amj', '2014-02-28 00:00:00', 'JPMorgan Alerian', 14849.76, -0.018599),
('A9G040819', 'amj', '2014-03-31 00:00:00', 'JPMorgan Alerian', 14892.8, 0.015203),
('A9G040819', 'amj', '2014-04-30 00:00:00', 'JPMorgan Alerian', 15513.6, 0.041684)
;
查询 1:
SELECT
s.CustodianAccountNum
, s.symbol
, d.enddate
, COALESCE(ab.ManagerName, ap.ManagerName) AS ManagerName
, COALESCE(ab.MarketValue, ap.MarketValue) AS MarketValue
, COALESCE(ab.NetReturn, 0) AS NetReturn
FROM (
SELECT
CustodianAccountNum
, symbol
, MIN(enddate) symstart
, MAX(enddate) symend
FROM Allbucket
GROUP BY
CustodianAccountNum
, symbol
) s
JOIN (
SELECT DISTINCT
cast(enddate as date) as enddate
FROM Allbucket
) d ON d.enddate BETWEEN s.symstart AND s.symend
LEFT JOIN Allbucket ab ON s.CustodianAccountNum = ab.CustodianAccountNum
AND s.symbol = ab.symbol
AND ab.enddate = d.enddate
OUTER APPLY (
SELECT TOP 1
t.*
FROM Allbucket t
WHERE s.CustodianAccountNum = t.CustodianAccountNum
AND s.symbol = t.symbol
AND d.enddate <= t.enddate
ORDER BY
d.enddate DESC
) ap
| CustodianAccountNum | symbol | enddate | ManagerName | MarketValue | NetReturn |
|---------------------|--------|------------|------------------|-------------|-----------|
| A9G040819 | amj | 2013-12-31 | JPMorgan Alerian | 1235 | -0.008154 |
| A9G040819 | amj | 2014-01-31 | JPMorgan Alerian | 14850 | 0 |
| A9G040819 | amj | 2014-02-28 | JPMorgan Alerian | 14850 | -0.018599 |
| A9G040819 | amj | 2014-03-31 | JPMorgan Alerian | 14893 | 0.015203 |
| A9G040819 | amj | 2014-04-30 | JPMorgan Alerian | 15514 | 0.041684 |
| A9G040819 | wabix | 2013-12-31 | GMO Benchmark | 34751 | 0.004072 |
| A9G040819 | wabix | 2014-01-31 | GMO Benchmark | 34129 | -0.017905 |
| A9G040819 | wabix | 2014-02-28 | GMO Benchmark | 49970 | 0.0202 |
| A9G040819 | wabix | 2014-03-31 | GMO Benchmark | 50371 | 0.008028 |
| A9G040819 | wabix | 2014-04-30 | GMO Benchmark | 50995 | 0.012389 |
nb:您可以使用 ISNULL() 而不是 COALESCE()
[EDITS] 对 NetValue 上的数据类型进行了更正,并在结束日期进行了更改,但这是可选的