通过遍历数据连接元素
Join elements by iterating through the data
我有一些数据的形式:
ID A B VALUE EXPECTED RESULT
1 1 2 5 GROUP1
2 2 3 5 GROUP1
3 3 4 6 GROUP2
4 3 5 5 GROUP1
5 6 4 5 GROUP3
我想做的是遍历数据(数千行)并创建一个公共字段,这样我就可以轻松地加入数据(*A-> 开始节点,B->结束节点值- > 顺序...数据形成类似链的形式,其中只有邻居共享一个公共 A 或 B)
加入规则:
组中所有元素的值相等
元素一的 A 等于元素二的 B(或相反但不是 A=A' 或 B=B')
最难的一个:将形成一系列相交节点的所有顺序数据分配给同一组。
这是第一个元素 [1 1 2 5] 必须与 [2 2 3 5] 连接,然后与 [4 3 5 5]
知道在遍历大量数据时如何稳健地完成这项工作吗?我对第 3 条规则有疑问,其他规则很容易应用。对于有限的数据,我取得了一些成功,但这取决于我开始检查数据的顺序。这不适用于大型数据集。
我可以使用 arcpy(最好)甚至 Python 或 R 或 Matlab 来解决这个问题。 尝试过 arcpy 但没有成功,所以我正在检查替代方案。
在 ArcPy 中,这段代码工作正常,但扩展有限(即在具有许多段的大型特征中,我得到 3-4 个组而不是 1 个):
TheShapefile="c:/Temp/temp.shp"
desc = arcpy.Describe(TheShapefile)
flds = desc.fields
fldin = 'no'
for fld in flds: #Check if new field exists
if fld.name == 'new':
fldin = 'yes'
if fldin!='yes': #If not create
arcpy.AddField_management(TheShapefile, "new", "SHORT")
arcpy.CalculateField_management(TheShapefile,"new",'!FID!', "PYTHON_9.3") # Copy FID to new
with arcpy.da.SearchCursor(TheShapefile, ["FID","NODE_A","NODE_B","ORDER_","new"]) as TheSearch:
for SearchRow in TheSearch:
if SearchRow[1]==SearchRow[4]:
Outer_FID=SearchRow[0]
else:
Outer_FID=SearchRow[4]
Outer_NODEA=SearchRow[1]
Outer_NODEB=SearchRow[2]
Outer_ORDER=SearchRow[3]
Outer_NEW=SearchRow[4]
with arcpy.da.UpdateCursor(TheShapefile, ["FID","NODE_A","NODE_B","ORDER_","new"]) as TheUpdate:
for UpdateRow in TheUpdate:
Inner_FID=UpdateRow[0]
Inner_NODEA=UpdateRow[1]
Inner_NODEB=UpdateRow[2]
Inner_ORDER=UpdateRow[3]
if Inner_ORDER==Outer_ORDER and (Inner_NODEA==Outer_NODEB or Inner_NODEB==Outer_NODEA):
UpdateRow[4]=Outer_FID
TheUpdate.updateRow(UpdateRow)
还有一些数据在shapefile form and dbf form
使用 matlab:
A = [1 1 2 5
2 2 3 5
3 3 4 6
4 3 5 5
5 6 4 5]
%% Initialization
% index of matrix line sharing the same group
ind = 1
% length of the index
len = length(ind)
% the group array
g = []
% group counter
c = 1
% Start the small algorithm
while 1
% Check if another line with the same "Value" share some common node
ind = find(any(ismember(A(:,2:3),A(ind,2:3)) & A(:,4) == A(ind(end),4),2));
% If there is no new line, we create a group with the discovered line
if length(ind) == len
%group assignment
g(A(ind,1)) = c
c = c+1
% delete the already discovered line (or node...)
A(ind,:) = []
% break if no more node
if isempty(A)
break
end
% reset the index for the next group
ind = 1;
end
len = length(ind);
end
这是输出:
g =
1 1 2 1 3
符合预期
我有一些数据的形式:
ID A B VALUE EXPECTED RESULT
1 1 2 5 GROUP1
2 2 3 5 GROUP1
3 3 4 6 GROUP2
4 3 5 5 GROUP1
5 6 4 5 GROUP3
我想做的是遍历数据(数千行)并创建一个公共字段,这样我就可以轻松地加入数据(*A-> 开始节点,B->结束节点值- > 顺序...数据形成类似链的形式,其中只有邻居共享一个公共 A 或 B)
加入规则:
组中所有元素的值相等
元素一的 A 等于元素二的 B(或相反但不是 A=A' 或 B=B')
最难的一个:将形成一系列相交节点的所有顺序数据分配给同一组。
这是第一个元素 [1 1 2 5] 必须与 [2 2 3 5] 连接,然后与 [4 3 5 5]
知道在遍历大量数据时如何稳健地完成这项工作吗?我对第 3 条规则有疑问,其他规则很容易应用。对于有限的数据,我取得了一些成功,但这取决于我开始检查数据的顺序。这不适用于大型数据集。 我可以使用 arcpy(最好)甚至 Python 或 R 或 Matlab 来解决这个问题。 尝试过 arcpy 但没有成功,所以我正在检查替代方案。
在 ArcPy 中,这段代码工作正常,但扩展有限(即在具有许多段的大型特征中,我得到 3-4 个组而不是 1 个):
TheShapefile="c:/Temp/temp.shp"
desc = arcpy.Describe(TheShapefile)
flds = desc.fields
fldin = 'no'
for fld in flds: #Check if new field exists
if fld.name == 'new':
fldin = 'yes'
if fldin!='yes': #If not create
arcpy.AddField_management(TheShapefile, "new", "SHORT")
arcpy.CalculateField_management(TheShapefile,"new",'!FID!', "PYTHON_9.3") # Copy FID to new
with arcpy.da.SearchCursor(TheShapefile, ["FID","NODE_A","NODE_B","ORDER_","new"]) as TheSearch:
for SearchRow in TheSearch:
if SearchRow[1]==SearchRow[4]:
Outer_FID=SearchRow[0]
else:
Outer_FID=SearchRow[4]
Outer_NODEA=SearchRow[1]
Outer_NODEB=SearchRow[2]
Outer_ORDER=SearchRow[3]
Outer_NEW=SearchRow[4]
with arcpy.da.UpdateCursor(TheShapefile, ["FID","NODE_A","NODE_B","ORDER_","new"]) as TheUpdate:
for UpdateRow in TheUpdate:
Inner_FID=UpdateRow[0]
Inner_NODEA=UpdateRow[1]
Inner_NODEB=UpdateRow[2]
Inner_ORDER=UpdateRow[3]
if Inner_ORDER==Outer_ORDER and (Inner_NODEA==Outer_NODEB or Inner_NODEB==Outer_NODEA):
UpdateRow[4]=Outer_FID
TheUpdate.updateRow(UpdateRow)
还有一些数据在shapefile form and dbf form
使用 matlab:
A = [1 1 2 5
2 2 3 5
3 3 4 6
4 3 5 5
5 6 4 5]
%% Initialization
% index of matrix line sharing the same group
ind = 1
% length of the index
len = length(ind)
% the group array
g = []
% group counter
c = 1
% Start the small algorithm
while 1
% Check if another line with the same "Value" share some common node
ind = find(any(ismember(A(:,2:3),A(ind,2:3)) & A(:,4) == A(ind(end),4),2));
% If there is no new line, we create a group with the discovered line
if length(ind) == len
%group assignment
g(A(ind,1)) = c
c = c+1
% delete the already discovered line (or node...)
A(ind,:) = []
% break if no more node
if isempty(A)
break
end
% reset the index for the next group
ind = 1;
end
len = length(ind);
end
这是输出:
g =
1 1 2 1 3
符合预期