如何用其他两个第一维和第二维不相等的 3d 数组的内容填充 3d 数组
How to fill 3d array with contents from two other 3d arrays with unequal first and second dimensions
我有来自两个不同供应商的两个 3d 数据阵列。
对于这两个数组,尺寸为:
维度 1:日期
维度 2:工具(不同的期货交割)
维度 3:六个工具属性(开盘价、最高价、最低价、收盘价、成交量、openInterest)
对于每个 3D 数组,我有两个用于日期和工具的变量(例如,我的代码中的 A1Times
和 A1Inst
)。
然而,尽管有明显的重叠,但两个数组中的日期和工具并不相同。某些日期 and/or 工具可能存在于 Array1
而不是 Array2
中,反之亦然。
我正在尝试创建 Array3
,第三个三维数据数组,其中第一个维度是来自两个来源的日期的并集,第二个维度是可用工具的并集,第三个维度也是六个仪器属性。
如果可能的话,我想从 Array2 填充 Array3。只有当 Array2 中没有任何内容时,我才想从 Array1 填充。
因此,对于给定的仪器和日期,如果 Array1 和 Array2 中存在数据,我想从 Array2 填充 Array3。
我尝试了一种解决方案,将数组的切片转换为时间表,使用 retime 使切片具有相同的时间长度,并将数据复制到第三个数组。这很慢,我认为必须有更好的方法。如果有人能告诉我一个矢量化的方法来做到这一点,我将不胜感激。
Array1 = randn(4,5,6); % time x instrument x attribute
A1Times = datetime([today-3:today]', 'ConvertFrom','datenum'); % times of first dimension of Array1
A1Inst = [3 4 5 6 7]'; % instruments of second dimension of Array1
Array1(round(1 + (numel(Array1)-1).*rand(round(numel(Array1)/5),1))) = NaN; % put a few random NaNs in the array
Array2 = randn(6,8,6);
A2Times = datetime([today-2:today+3]','ConvertFrom','datenum'); % times of first dimension of Array2
A2Inst = [1 2 5 6 7 8 9 10]'; % instruments of second dimension of Array2
Array2(round(1 + (numel(Array2)-1).*rand(round(numel(Array2)/5),1))) = NaN; % put a few random NaNs in the array
% third dimension will always be the same for both matrices
dateUnion = union(A1Times,A2Times);
instrumentUnion = union(A1Inst,A2Inst);
% Initialize A3:
Array3 = NaN(numel(dateUnion),numel(instrumentUnion),6);
% what I want to do:
% if data exists for both Array1 and Array2, populate Array3 with data from Array1
% if data doesn't exist for Array1 and does exist for Array2, populate Array3 from Array2
%% clumsy retime solution, with two for loops
A1varnames = matlab.lang.makeValidName(cellstr([repmat('Array1Instrument',numel(A1Inst),1) num2str(A1Inst)]));
A2varnames = matlab.lang.makeValidName(cellstr([repmat('Array2Instrument',numel(A2Inst),1) num2str(A2Inst)]));
for ij = 1:6 % looping through third dimension
A1layer = array2timetable(Array1(:,:,ij),'RowTimes',A1Times);
A1layer.Properties.VariableNames = A1varnames;
A2layer = array2timetable(Array2(:,:,ij),'RowTimes',A2Times);
A2layer.Properties.VariableNames = A2varnames;
A1layer = retime(A1layer,dateUnion);
A2layer = retime(A2layer,dateUnion);
for ii = 1:numel(instrumentUnion)
[~,A1loc] = ismember(instrumentUnion(ii),A1Inst);
[~,A2loc] = ismember(instrumentUnion(ii),A2Inst);
if (A1loc == 0)
Array3(:,ii,ij) = A2layer{:,A2loc};
elseif A2loc == 0
Array3(:,ii,ij) = A1layer{:,A1loc};
else % if instrument exists in both sources
A1vec = A1layer{:,A1loc};
A2vec = A2layer{:,A2loc};
% if data exists in Array2 and Array1, choose Array2
% if data exists in Array2 and not Array1, choose Array2
% if data exists in Array1 and not Array2, choose Array1
bothpopulated = ~isnan(A1vec) & ~isnan(A2vec);
onlyA2populated = ~isnan(A2vec) & isnan(A1vec);
onlyA1populated = isnan(A2vec) & ~isnan(A1vec);
Array3(bothpopulated,ii,ij) = A2vec(bothpopulated);
Array3(onlyA2populated,ii,ij) = A2vec(onlyA2populated);
Array3(onlyA1populated,ii,ij) = A1vec(onlyA1populated);
end
end
end
首先,您需要将 AxTimes
和 AxInst
映射到连续整数,以便它们可用于多维数组索引。 unique
的第三个输出给出了这些索引。之后,您只需要使用逻辑和多维数组索引来分配值。在这里,我简化了您的示例并将 A1Times
更改为数字。
Array1 = randn(4,5,6);
A1Times = [1 2 3 4].'
A1Inst = [3 4 5 6 7].';
Array1(round(1 + (numel(Array1)-1).*rand(round(numel(Array1)/5),1))) = NaN;
Array2 = randn(6,8,6);
A2Times = [3 4 5 6 7 8].';
A2Inst = [1 2 5 6 7 8 9 10].';
Array2(round(1 + (numel(Array2)-1).*rand(round(numel(Array2)/5),1))) = NaN;
[ut,~,iut] = unique([A1Times; A2Times]);
[ui,~,iui] = unique([A1Inst; A2Inst]);
Array3 = NaN(numel(ut), numel(ui), 6);
Array3(iut(numel(A1Times)+1:end), iui(numel(A1Inst)+1:end), :) = Array2;
idx3 = false(size(Array3));
idx3(iut(1:numel(A1Times)), iui(1:numel(A1Inst)), :) = true;
idx3 = idx3 & isnan(Array3);
idx1 = idx3(iut(1:numel(A1Times)), iui(1:numel(A1Inst)), :);
Array3(idx3) = Array1(idx1);
我有来自两个不同供应商的两个 3d 数据阵列。 对于这两个数组,尺寸为:
维度 1:日期
维度 2:工具(不同的期货交割)
维度 3:六个工具属性(开盘价、最高价、最低价、收盘价、成交量、openInterest)
对于每个 3D 数组,我有两个用于日期和工具的变量(例如,我的代码中的 A1Times
和 A1Inst
)。
然而,尽管有明显的重叠,但两个数组中的日期和工具并不相同。某些日期 and/or 工具可能存在于 Array1
而不是 Array2
中,反之亦然。
我正在尝试创建 Array3
,第三个三维数据数组,其中第一个维度是来自两个来源的日期的并集,第二个维度是可用工具的并集,第三个维度也是六个仪器属性。
如果可能的话,我想从 Array2 填充 Array3。只有当 Array2 中没有任何内容时,我才想从 Array1 填充。 因此,对于给定的仪器和日期,如果 Array1 和 Array2 中存在数据,我想从 Array2 填充 Array3。
我尝试了一种解决方案,将数组的切片转换为时间表,使用 retime 使切片具有相同的时间长度,并将数据复制到第三个数组。这很慢,我认为必须有更好的方法。如果有人能告诉我一个矢量化的方法来做到这一点,我将不胜感激。
Array1 = randn(4,5,6); % time x instrument x attribute
A1Times = datetime([today-3:today]', 'ConvertFrom','datenum'); % times of first dimension of Array1
A1Inst = [3 4 5 6 7]'; % instruments of second dimension of Array1
Array1(round(1 + (numel(Array1)-1).*rand(round(numel(Array1)/5),1))) = NaN; % put a few random NaNs in the array
Array2 = randn(6,8,6);
A2Times = datetime([today-2:today+3]','ConvertFrom','datenum'); % times of first dimension of Array2
A2Inst = [1 2 5 6 7 8 9 10]'; % instruments of second dimension of Array2
Array2(round(1 + (numel(Array2)-1).*rand(round(numel(Array2)/5),1))) = NaN; % put a few random NaNs in the array
% third dimension will always be the same for both matrices
dateUnion = union(A1Times,A2Times);
instrumentUnion = union(A1Inst,A2Inst);
% Initialize A3:
Array3 = NaN(numel(dateUnion),numel(instrumentUnion),6);
% what I want to do:
% if data exists for both Array1 and Array2, populate Array3 with data from Array1
% if data doesn't exist for Array1 and does exist for Array2, populate Array3 from Array2
%% clumsy retime solution, with two for loops
A1varnames = matlab.lang.makeValidName(cellstr([repmat('Array1Instrument',numel(A1Inst),1) num2str(A1Inst)]));
A2varnames = matlab.lang.makeValidName(cellstr([repmat('Array2Instrument',numel(A2Inst),1) num2str(A2Inst)]));
for ij = 1:6 % looping through third dimension
A1layer = array2timetable(Array1(:,:,ij),'RowTimes',A1Times);
A1layer.Properties.VariableNames = A1varnames;
A2layer = array2timetable(Array2(:,:,ij),'RowTimes',A2Times);
A2layer.Properties.VariableNames = A2varnames;
A1layer = retime(A1layer,dateUnion);
A2layer = retime(A2layer,dateUnion);
for ii = 1:numel(instrumentUnion)
[~,A1loc] = ismember(instrumentUnion(ii),A1Inst);
[~,A2loc] = ismember(instrumentUnion(ii),A2Inst);
if (A1loc == 0)
Array3(:,ii,ij) = A2layer{:,A2loc};
elseif A2loc == 0
Array3(:,ii,ij) = A1layer{:,A1loc};
else % if instrument exists in both sources
A1vec = A1layer{:,A1loc};
A2vec = A2layer{:,A2loc};
% if data exists in Array2 and Array1, choose Array2
% if data exists in Array2 and not Array1, choose Array2
% if data exists in Array1 and not Array2, choose Array1
bothpopulated = ~isnan(A1vec) & ~isnan(A2vec);
onlyA2populated = ~isnan(A2vec) & isnan(A1vec);
onlyA1populated = isnan(A2vec) & ~isnan(A1vec);
Array3(bothpopulated,ii,ij) = A2vec(bothpopulated);
Array3(onlyA2populated,ii,ij) = A2vec(onlyA2populated);
Array3(onlyA1populated,ii,ij) = A1vec(onlyA1populated);
end
end
end
首先,您需要将 AxTimes
和 AxInst
映射到连续整数,以便它们可用于多维数组索引。 unique
的第三个输出给出了这些索引。之后,您只需要使用逻辑和多维数组索引来分配值。在这里,我简化了您的示例并将 A1Times
更改为数字。
Array1 = randn(4,5,6);
A1Times = [1 2 3 4].'
A1Inst = [3 4 5 6 7].';
Array1(round(1 + (numel(Array1)-1).*rand(round(numel(Array1)/5),1))) = NaN;
Array2 = randn(6,8,6);
A2Times = [3 4 5 6 7 8].';
A2Inst = [1 2 5 6 7 8 9 10].';
Array2(round(1 + (numel(Array2)-1).*rand(round(numel(Array2)/5),1))) = NaN;
[ut,~,iut] = unique([A1Times; A2Times]);
[ui,~,iui] = unique([A1Inst; A2Inst]);
Array3 = NaN(numel(ut), numel(ui), 6);
Array3(iut(numel(A1Times)+1:end), iui(numel(A1Inst)+1:end), :) = Array2;
idx3 = false(size(Array3));
idx3(iut(1:numel(A1Times)), iui(1:numel(A1Inst)), :) = true;
idx3 = idx3 & isnan(Array3);
idx1 = idx3(iut(1:numel(A1Times)), iui(1:numel(A1Inst)), :);
Array3(idx3) = Array1(idx1);