如何压缩 2 个 Deedle 帧并处理缺失值?
How to zip 2 Deedle frames and handle missing values?
给定价格框架 priceFrame
为
28881 29021 29399
2010-01-01 00:00:00 -> 123.535878499576 195.28635425580265 189.92210186152082
2010-01-04 00:00:00 -> 124.19087548338847 198.10448102247753 190.1571733631235
2010-01-05 00:00:00 -> 123.82028508465247 197.8259452373992 190.31388769752525
2010-01-06 00:00:00 -> 124.17363872065654 197.80956077945342 189.98478759528152
2010-01-07 00:00:00 -> 123.4583130672824 197.58017836821244 190.31388769752527
2010-01-08 00:00:00 -> 124.23396739021821 198.10448102247756 190.25120196376457
2010-01-11 00:00:00 -> 125.12166067091142 197.87509861123658 190.73701640041008
2010-01-12 00:00:00 -> 124.9234378994945 195.0569718445617 191.41088803833776
2010-01-13 00:00:00 -> 125.06133200134975 195.64681233060992 191.50491663897884
2010-01-14 00:00:00 -> 124.97514818769021 196.28580619049552 191.56760237273951
2010-01-15 00:00:00 -> 123.71686450826103 192.5829186947483 192.08475967626538
2010-01-18 00:00:00 -> 123.71686450826103 194.10667328370621 192.31983117786805
2010-01-19 00:00:00 -> 123.15666971947407 195.87619474185092 191.94371677530378
2010-01-20 00:00:00 -> 121.5622691667727 191.79646471335064 192.82131704795376
2010-01-21 00:00:00 -> 121.5450324040408 188.38849746062752 192.9937028157957
2010-01-22 00:00:00 -> 121.81220222638535 186.8647428716696 192.9937028157957
2010-01-25 00:00:00 -> 121.94147794687466 184.83307008639233 192.9937028157957
2010-01-26 00:00:00 -> 121.38990153945363 185.9799821425972 193.19743145051802
2010-01-27 00:00:00 -> 120.94174570842405 184.91499237612123 193.3541457849198
2010-01-28 00:00:00 -> 120.44187958919875 182.5392459739825 193.22877431739838
2010-01-29 00:00:00 -> 119.4938576389439 183.75169586197052 193.35414578491978
和股息框架 divFrame
为
28881 29021 29399
2010-01-04 00:00:00 -> 1.3 <missing> <missing>
2010-01-13 00:00:00 -> <missing> 1.3 <missing>
2010-01-22 00:00:00 -> <missing> <missing> 1.3
我想将它们组合起来,以便在存在股息的情况下获得价格 + 股息,否则保持价格不变。
以下两种尝试
let dfZipped1 = priceFrame.Zip(divFrame, JoinKind.Left, JoinKind.Left, Lookup.Exact, false, fun (p:float) d -> p + d)
dfZipped1.Print()
let dfZipped2 = priceFrame.Zip(divFrame, JoinKind.Left, JoinKind.Left, Lookup.Exact, true, fun (p:float) d -> p + d)
dfZipped2.Print()
结果相同
28881 29021 29399
2010-01-01 00:00:00 -> <missing> <missing> <missing>
2010-01-04 00:00:00 -> 125.49087548338846 <missing> <missing>
2010-01-05 00:00:00 -> <missing> <missing> <missing>
2010-01-06 00:00:00 -> <missing> <missing> <missing>
2010-01-07 00:00:00 -> <missing> <missing> <missing>
2010-01-08 00:00:00 -> <missing> <missing> <missing>
2010-01-11 00:00:00 -> <missing> <missing> <missing>
2010-01-12 00:00:00 -> <missing> <missing> <missing>
2010-01-13 00:00:00 -> <missing> 196.94681233060993 <missing>
2010-01-14 00:00:00 -> <missing> <missing> <missing>
2010-01-15 00:00:00 -> <missing> <missing> <missing>
2010-01-18 00:00:00 -> <missing> <missing> <missing>
2010-01-19 00:00:00 -> <missing> <missing> <missing>
2010-01-20 00:00:00 -> <missing> <missing> <missing>
2010-01-21 00:00:00 -> <missing> <missing> <missing>
2010-01-22 00:00:00 -> <missing> <missing> 194.2937028157957
2010-01-25 00:00:00 -> <missing> <missing> <missing>
2010-01-26 00:00:00 -> <missing> <missing> <missing>
2010-01-27 00:00:00 -> <missing> <missing> <missing>
2010-01-28 00:00:00 -> <missing> <missing> <missing>
2010-01-29 00:00:00 -> <missing> <missing> <missing>
没有遗漏的数字是正确的。但是我想保留没有分红的价格
let dfZipped3 = priceFrame.Zip(divFrame, JoinKind.Left, JoinKind.Left, Lookup.Exact, false, fun (p:float) d -> p + (d |> Option.defaultValue 0.0))
dfZipped3.Print()
结果
28881 29021 29399
2010-01-01 00:00:00 -> 123.535878499576 195.28635425580265 189.92210186152082
2010-01-04 00:00:00 -> 124.19087548338847 198.10448102247753 190.1571733631235
2010-01-05 00:00:00 -> 123.82028508465247 197.8259452373992 190.31388769752525
2010-01-06 00:00:00 -> 124.17363872065654 197.80956077945342 189.98478759528152
2010-01-07 00:00:00 -> 123.4583130672824 197.58017836821244 190.31388769752527
2010-01-08 00:00:00 -> 124.23396739021821 198.10448102247756 190.25120196376457
2010-01-11 00:00:00 -> 125.12166067091142 197.87509861123658 190.73701640041008
2010-01-12 00:00:00 -> 124.9234378994945 195.0569718445617 191.41088803833776
2010-01-13 00:00:00 -> 125.06133200134975 195.64681233060992 191.50491663897884
2010-01-14 00:00:00 -> 124.97514818769021 196.28580619049552 191.56760237273951
2010-01-15 00:00:00 -> 123.71686450826103 192.5829186947483 192.08475967626538
2010-01-18 00:00:00 -> 123.71686450826103 194.10667328370621 192.31983117786805
2010-01-19 00:00:00 -> 123.15666971947407 195.87619474185092 191.94371677530378
2010-01-20 00:00:00 -> 121.5622691667727 191.79646471335064 192.82131704795376
2010-01-21 00:00:00 -> 121.5450324040408 188.38849746062752 192.9937028157957
2010-01-22 00:00:00 -> 121.81220222638535 186.8647428716696 192.9937028157957
2010-01-25 00:00:00 -> 121.94147794687466 184.83307008639233 192.9937028157957
2010-01-26 00:00:00 -> 121.38990153945363 185.9799821425972 193.19743145051802
2010-01-27 00:00:00 -> 120.94174570842405 184.91499237612123 193.3541457849198
2010-01-28 00:00:00 -> 120.44187958919875 182.5392459739825 193.22877431739838
2010-01-29 00:00:00 -> 119.4938576389439 183.75169586197052 193.35414578491978
所有价格都在那里,但 none 的股息已被添加
let dfZipped4 = priceFrame.Zip(divFrame, JoinKind.Left, JoinKind.Left, Lookup.Exact, true, fun (p:float) d -> p + (d |> Option.defaultValue 0.0))
dfZipped4.Print()
除了缺失值外什么都没有。
如何在股息对齐时将价格添加到股息中,否则价格保持不变?
更新
我已经为Frocha和zuzhu的每一个答案的执行计时了。 zyzhu 的第二个答案就目前而言并没有产生正确的结果。
对于每种技术连续运行 1000 次,我得到的典型时间为
frocha1: 572.974400
frocha2: 562.867600
zyzhu1: 1099.057100
frocha2 始终比 frocha1 稍快。 zyzhu1 总是比其他人慢。所以现在我接受 Frocha 的回答。
但是如果zyzhu2可以运行,它可能是最快的,因为它是最简单的。在这种情况下,我将更改已接受的答案。
我的方法没有考虑速度限制,如下:
1) 重命名列以便能够无错误地执行 Join
2)加入框架。
3)用零替换缺失值。
4) 对相应的列求和。
5) 删除股息栏。
6) 可选:如果不需要 "string" 转换,则将 priceFrame 的 col 名称更改为原始类型。
module Frame =
//I usually add this handy function to the Frame module
let mapReplaceCol col f frame =
frame
|> Frame.replaceCol col (Frame.mapRowValues f frame)
let priceFrame' = priceFrame |> Frame.mapColKeys string
//appends a "D" in the col key to eliminate col with same name
let dividends' =
dividends
|> Frame.mapColKeys (string >> (+) "D")
let joinedFrame =
priceFrame'
|> Frame.join JoinKind.Right dividends'
|> Frame.fillMissingWith 0.
(joinedFrame,priceFrame'.ColumnKeys |> List.ofSeq)
||> List.fold (fun acc elem ->
acc|> Frame.mapReplaceCol elem (fun row ->
row.GetAs<float>("D" + elem) + row.GetAs<float>(elem))
|> Frame.dropCol ("D" + elem))
编辑
另一种使用 Zip 的方法。
//generate a dividends frame with the same rows of priceFrame
let dividends2 =
(priceFrame,priceFrame.ColumnKeys |> List.ofSeq)
||> List.fold (fun acc elem -> acc|> Frame.dropCol elem) //empty frame
|> Frame.join JoinKind.Outer dividends
|> Frame.fillMissingWith 0.
(priceFrame,dividends2) ||> Frame.zip (fun (p : float) (d : float) -> p + d)
要使用zip
,两个框架将逐列匹配并将两个系列相加。
在您的案例中,divFrame
的观测值少于 priceFrame
。当两个系列没有相同数量的观察值并加在一起时,将缺少不匹配的结果。
我的解决方案是创建一个虚拟框架,使 divFrame
首先与 priceFrame
对齐。
let divFrame2 =
let dummy =
priceFrame.RowKeys
|> Seq.collect(fun row -> divFrame.ColumnKeys |> Seq.map(fun col -> row, col, 0) )
|> Frame.ofValues
(dummy + divFrame).FillMissing(0.)
priceFrame + divFrame2
我有类似的问题,我的解决方案与zyzhu
类似,但可以处理多个帧
let zipAll (dfs:Frame<_,_>[]) =
let outerKeys = dfs |> Array.collect (fun df -> df.RowKeys |> Array.ofSeq) |> Array.distinct |> Array.sort
let dfsNew =
dfs
|> Array.map ( Frame.indexRowsWith outerKeys >> Frame.mapRowValues (Series.fillMissingWith 0.) >> Frame.ofRows)
Array.fold (Frame.zip (+)) (Array.head dfsNew) (Array.tail dfsNew)
[|priceFrame;dividends2|] |> zipAll
它可能很慢,但它可以处理多帧。
给定价格框架 priceFrame
为
28881 29021 29399
2010-01-01 00:00:00 -> 123.535878499576 195.28635425580265 189.92210186152082
2010-01-04 00:00:00 -> 124.19087548338847 198.10448102247753 190.1571733631235
2010-01-05 00:00:00 -> 123.82028508465247 197.8259452373992 190.31388769752525
2010-01-06 00:00:00 -> 124.17363872065654 197.80956077945342 189.98478759528152
2010-01-07 00:00:00 -> 123.4583130672824 197.58017836821244 190.31388769752527
2010-01-08 00:00:00 -> 124.23396739021821 198.10448102247756 190.25120196376457
2010-01-11 00:00:00 -> 125.12166067091142 197.87509861123658 190.73701640041008
2010-01-12 00:00:00 -> 124.9234378994945 195.0569718445617 191.41088803833776
2010-01-13 00:00:00 -> 125.06133200134975 195.64681233060992 191.50491663897884
2010-01-14 00:00:00 -> 124.97514818769021 196.28580619049552 191.56760237273951
2010-01-15 00:00:00 -> 123.71686450826103 192.5829186947483 192.08475967626538
2010-01-18 00:00:00 -> 123.71686450826103 194.10667328370621 192.31983117786805
2010-01-19 00:00:00 -> 123.15666971947407 195.87619474185092 191.94371677530378
2010-01-20 00:00:00 -> 121.5622691667727 191.79646471335064 192.82131704795376
2010-01-21 00:00:00 -> 121.5450324040408 188.38849746062752 192.9937028157957
2010-01-22 00:00:00 -> 121.81220222638535 186.8647428716696 192.9937028157957
2010-01-25 00:00:00 -> 121.94147794687466 184.83307008639233 192.9937028157957
2010-01-26 00:00:00 -> 121.38990153945363 185.9799821425972 193.19743145051802
2010-01-27 00:00:00 -> 120.94174570842405 184.91499237612123 193.3541457849198
2010-01-28 00:00:00 -> 120.44187958919875 182.5392459739825 193.22877431739838
2010-01-29 00:00:00 -> 119.4938576389439 183.75169586197052 193.35414578491978
和股息框架 divFrame
为
28881 29021 29399
2010-01-04 00:00:00 -> 1.3 <missing> <missing>
2010-01-13 00:00:00 -> <missing> 1.3 <missing>
2010-01-22 00:00:00 -> <missing> <missing> 1.3
我想将它们组合起来,以便在存在股息的情况下获得价格 + 股息,否则保持价格不变。
以下两种尝试
let dfZipped1 = priceFrame.Zip(divFrame, JoinKind.Left, JoinKind.Left, Lookup.Exact, false, fun (p:float) d -> p + d)
dfZipped1.Print()
let dfZipped2 = priceFrame.Zip(divFrame, JoinKind.Left, JoinKind.Left, Lookup.Exact, true, fun (p:float) d -> p + d)
dfZipped2.Print()
结果相同
28881 29021 29399
2010-01-01 00:00:00 -> <missing> <missing> <missing>
2010-01-04 00:00:00 -> 125.49087548338846 <missing> <missing>
2010-01-05 00:00:00 -> <missing> <missing> <missing>
2010-01-06 00:00:00 -> <missing> <missing> <missing>
2010-01-07 00:00:00 -> <missing> <missing> <missing>
2010-01-08 00:00:00 -> <missing> <missing> <missing>
2010-01-11 00:00:00 -> <missing> <missing> <missing>
2010-01-12 00:00:00 -> <missing> <missing> <missing>
2010-01-13 00:00:00 -> <missing> 196.94681233060993 <missing>
2010-01-14 00:00:00 -> <missing> <missing> <missing>
2010-01-15 00:00:00 -> <missing> <missing> <missing>
2010-01-18 00:00:00 -> <missing> <missing> <missing>
2010-01-19 00:00:00 -> <missing> <missing> <missing>
2010-01-20 00:00:00 -> <missing> <missing> <missing>
2010-01-21 00:00:00 -> <missing> <missing> <missing>
2010-01-22 00:00:00 -> <missing> <missing> 194.2937028157957
2010-01-25 00:00:00 -> <missing> <missing> <missing>
2010-01-26 00:00:00 -> <missing> <missing> <missing>
2010-01-27 00:00:00 -> <missing> <missing> <missing>
2010-01-28 00:00:00 -> <missing> <missing> <missing>
2010-01-29 00:00:00 -> <missing> <missing> <missing>
没有遗漏的数字是正确的。但是我想保留没有分红的价格
let dfZipped3 = priceFrame.Zip(divFrame, JoinKind.Left, JoinKind.Left, Lookup.Exact, false, fun (p:float) d -> p + (d |> Option.defaultValue 0.0))
dfZipped3.Print()
结果
28881 29021 29399
2010-01-01 00:00:00 -> 123.535878499576 195.28635425580265 189.92210186152082
2010-01-04 00:00:00 -> 124.19087548338847 198.10448102247753 190.1571733631235
2010-01-05 00:00:00 -> 123.82028508465247 197.8259452373992 190.31388769752525
2010-01-06 00:00:00 -> 124.17363872065654 197.80956077945342 189.98478759528152
2010-01-07 00:00:00 -> 123.4583130672824 197.58017836821244 190.31388769752527
2010-01-08 00:00:00 -> 124.23396739021821 198.10448102247756 190.25120196376457
2010-01-11 00:00:00 -> 125.12166067091142 197.87509861123658 190.73701640041008
2010-01-12 00:00:00 -> 124.9234378994945 195.0569718445617 191.41088803833776
2010-01-13 00:00:00 -> 125.06133200134975 195.64681233060992 191.50491663897884
2010-01-14 00:00:00 -> 124.97514818769021 196.28580619049552 191.56760237273951
2010-01-15 00:00:00 -> 123.71686450826103 192.5829186947483 192.08475967626538
2010-01-18 00:00:00 -> 123.71686450826103 194.10667328370621 192.31983117786805
2010-01-19 00:00:00 -> 123.15666971947407 195.87619474185092 191.94371677530378
2010-01-20 00:00:00 -> 121.5622691667727 191.79646471335064 192.82131704795376
2010-01-21 00:00:00 -> 121.5450324040408 188.38849746062752 192.9937028157957
2010-01-22 00:00:00 -> 121.81220222638535 186.8647428716696 192.9937028157957
2010-01-25 00:00:00 -> 121.94147794687466 184.83307008639233 192.9937028157957
2010-01-26 00:00:00 -> 121.38990153945363 185.9799821425972 193.19743145051802
2010-01-27 00:00:00 -> 120.94174570842405 184.91499237612123 193.3541457849198
2010-01-28 00:00:00 -> 120.44187958919875 182.5392459739825 193.22877431739838
2010-01-29 00:00:00 -> 119.4938576389439 183.75169586197052 193.35414578491978
所有价格都在那里,但 none 的股息已被添加
let dfZipped4 = priceFrame.Zip(divFrame, JoinKind.Left, JoinKind.Left, Lookup.Exact, true, fun (p:float) d -> p + (d |> Option.defaultValue 0.0))
dfZipped4.Print()
除了缺失值外什么都没有。
如何在股息对齐时将价格添加到股息中,否则价格保持不变?
更新
我已经为Frocha和zuzhu的每一个答案的执行计时了。 zyzhu 的第二个答案就目前而言并没有产生正确的结果。
对于每种技术连续运行 1000 次,我得到的典型时间为
frocha1: 572.974400
frocha2: 562.867600
zyzhu1: 1099.057100
frocha2 始终比 frocha1 稍快。 zyzhu1 总是比其他人慢。所以现在我接受 Frocha 的回答。
但是如果zyzhu2可以运行,它可能是最快的,因为它是最简单的。在这种情况下,我将更改已接受的答案。
我的方法没有考虑速度限制,如下: 1) 重命名列以便能够无错误地执行 Join 2)加入框架。 3)用零替换缺失值。 4) 对相应的列求和。 5) 删除股息栏。 6) 可选:如果不需要 "string" 转换,则将 priceFrame 的 col 名称更改为原始类型。
module Frame =
//I usually add this handy function to the Frame module
let mapReplaceCol col f frame =
frame
|> Frame.replaceCol col (Frame.mapRowValues f frame)
let priceFrame' = priceFrame |> Frame.mapColKeys string
//appends a "D" in the col key to eliminate col with same name
let dividends' =
dividends
|> Frame.mapColKeys (string >> (+) "D")
let joinedFrame =
priceFrame'
|> Frame.join JoinKind.Right dividends'
|> Frame.fillMissingWith 0.
(joinedFrame,priceFrame'.ColumnKeys |> List.ofSeq)
||> List.fold (fun acc elem ->
acc|> Frame.mapReplaceCol elem (fun row ->
row.GetAs<float>("D" + elem) + row.GetAs<float>(elem))
|> Frame.dropCol ("D" + elem))
编辑
另一种使用 Zip 的方法。
//generate a dividends frame with the same rows of priceFrame
let dividends2 =
(priceFrame,priceFrame.ColumnKeys |> List.ofSeq)
||> List.fold (fun acc elem -> acc|> Frame.dropCol elem) //empty frame
|> Frame.join JoinKind.Outer dividends
|> Frame.fillMissingWith 0.
(priceFrame,dividends2) ||> Frame.zip (fun (p : float) (d : float) -> p + d)
要使用zip
,两个框架将逐列匹配并将两个系列相加。
在您的案例中,divFrame
的观测值少于 priceFrame
。当两个系列没有相同数量的观察值并加在一起时,将缺少不匹配的结果。
我的解决方案是创建一个虚拟框架,使 divFrame
首先与 priceFrame
对齐。
let divFrame2 =
let dummy =
priceFrame.RowKeys
|> Seq.collect(fun row -> divFrame.ColumnKeys |> Seq.map(fun col -> row, col, 0) )
|> Frame.ofValues
(dummy + divFrame).FillMissing(0.)
priceFrame + divFrame2
我有类似的问题,我的解决方案与zyzhu
类似,但可以处理多个帧
let zipAll (dfs:Frame<_,_>[]) =
let outerKeys = dfs |> Array.collect (fun df -> df.RowKeys |> Array.ofSeq) |> Array.distinct |> Array.sort
let dfsNew =
dfs
|> Array.map ( Frame.indexRowsWith outerKeys >> Frame.mapRowValues (Series.fillMissingWith 0.) >> Frame.ofRows)
Array.fold (Frame.zip (+)) (Array.head dfsNew) (Array.tail dfsNew)
[|priceFrame;dividends2|] |> zipAll
它可能很慢,但它可以处理多帧。