如何使用 MATLAB 基于另一列对单列中的数据进行分组
How to group data in a single column based on another column using MATLAB
我有 99940 个三列数据,第一列和第二列带有 ID 号,第三列带有如下草图数据的权重。有 441 个唯一 ID 值在第一列和第二列中重复。我想对 id1 进行分组,使得每个组都有三个连续的值,并且如果相应的 id2 与组内的 id1 具有相同的值,则对权重求和。
data:
id1 id2 weight
1 3 10
1 4 10
1 7 10
1 8 10
2 1 10
2 5 10
3 2 10
4 3 10
4 6 10
5 3 10
6 4 10
7 2 10
8 1 10
result:
group(1)
id1 id2 weight selected
1 3 10 Yes (Because group1 has 1,2,3 and id1 is 1 and id2 is 3)
1 4 10 No
1 7 10 No
1 8 10 No
2 1 10 Yes (Because group1 has 1,2,3 and id1 is 2 and id2 is 1)
2 5 10 No
3 2 10 Yes (Because group1 has 1,2,3 and id1 is 3 and id2 is 2)
Weight = 30
group(2)
4 3 10 No
4 6 10 Yes (Because group2 has 4,5,6 and id1 is 4 and id2 is 6)
5 3 10 No
6 4 10 Yes (Because group2 has 4,5,6 and id1 is 4 and id2 is 4)
Weight=20
group(3)
7 2 10 No
8 1 10 No
等等。
我曾尝试使用 grouppixels、sortrows 来实现权重得分并根据另一列查找列值,但我发现在分组时遇到困难。
您可以像这样使用 ismember
函数:
首先,确定唯一的 id1 值
id1 = data(:, 1);
unique_id1 = unique(id1, 'stable');
然后,循环遍历 3 个一组的唯一 ID,并从 data
中提取与三个 ID 中的任何一个匹配的行。
weights = [];
groups = cell(0);
for ii = 1:3:length(unique_id1) - 2
% Pull out just the id1 values in this group
ids_in_group = unique_id1(ii:ii+2);
% answer has 1 if id1 is in ids_in_group, 0 otherwise
select_row_for_group = ismember(id1, ids_in_group);
% Logical indexing, select only rows with 1 in select_row_for_group
group_data = data(select_row_for_group, :);
% Append new group to our cell array
groups{end+1} = group_data;
% Select a row in the group for weight calculation if its id2 is in ids_in_group
select_row_for_weight = ismember(group_data(:, 2), ids_in_group);
% Select only the weights we want
selected_weights = group_data(select_row_for_weight, 3);
% Sum the selected weights
weightsum = sum(selected_weights);
% Append to weights array
weights(end+1) = weightsum;
end
现在你有:
>> groups{1}
ans =
1 3 10
1 4 10
1 7 10
1 8 10
2 1 10
2 5 10
3 2 10
>> groups{2}
ans =
4 3 10
4 6 10
5 3 10
6 4 10
>> groups{3}
ans =
7 2 10
8 1 10
>> weights
weights =
30 20 0
我有 99940 个三列数据,第一列和第二列带有 ID 号,第三列带有如下草图数据的权重。有 441 个唯一 ID 值在第一列和第二列中重复。我想对 id1 进行分组,使得每个组都有三个连续的值,并且如果相应的 id2 与组内的 id1 具有相同的值,则对权重求和。
data:
id1 id2 weight
1 3 10
1 4 10
1 7 10
1 8 10
2 1 10
2 5 10
3 2 10
4 3 10
4 6 10
5 3 10
6 4 10
7 2 10
8 1 10
result:
group(1)
id1 id2 weight selected
1 3 10 Yes (Because group1 has 1,2,3 and id1 is 1 and id2 is 3)
1 4 10 No
1 7 10 No
1 8 10 No
2 1 10 Yes (Because group1 has 1,2,3 and id1 is 2 and id2 is 1)
2 5 10 No
3 2 10 Yes (Because group1 has 1,2,3 and id1 is 3 and id2 is 2)
Weight = 30
group(2)
4 3 10 No
4 6 10 Yes (Because group2 has 4,5,6 and id1 is 4 and id2 is 6)
5 3 10 No
6 4 10 Yes (Because group2 has 4,5,6 and id1 is 4 and id2 is 4)
Weight=20
group(3)
7 2 10 No
8 1 10 No
等等。
我曾尝试使用 grouppixels、sortrows 来实现权重得分并根据另一列查找列值,但我发现在分组时遇到困难。
您可以像这样使用 ismember
函数:
首先,确定唯一的 id1 值
id1 = data(:, 1);
unique_id1 = unique(id1, 'stable');
然后,循环遍历 3 个一组的唯一 ID,并从 data
中提取与三个 ID 中的任何一个匹配的行。
weights = [];
groups = cell(0);
for ii = 1:3:length(unique_id1) - 2
% Pull out just the id1 values in this group
ids_in_group = unique_id1(ii:ii+2);
% answer has 1 if id1 is in ids_in_group, 0 otherwise
select_row_for_group = ismember(id1, ids_in_group);
% Logical indexing, select only rows with 1 in select_row_for_group
group_data = data(select_row_for_group, :);
% Append new group to our cell array
groups{end+1} = group_data;
% Select a row in the group for weight calculation if its id2 is in ids_in_group
select_row_for_weight = ismember(group_data(:, 2), ids_in_group);
% Select only the weights we want
selected_weights = group_data(select_row_for_weight, 3);
% Sum the selected weights
weightsum = sum(selected_weights);
% Append to weights array
weights(end+1) = weightsum;
end
现在你有:
>> groups{1}
ans =
1 3 10
1 4 10
1 7 10
1 8 10
2 1 10
2 5 10
3 2 10
>> groups{2}
ans =
4 3 10
4 6 10
5 3 10
6 4 10
>> groups{3}
ans =
7 2 10
8 1 10
>> weights
weights =
30 20 0