如何使用 MATLAB 基于另一列对单列中的数据进行分组

Question

我有 99940 个三列数据，第一列和第二列带有 ID 号，第三列带有如下草图数据的权重。有 441 个唯一 ID 值在第一列和第二列中重复。我想对 id1 进行分组，使得每个组都有三个连续的值，并且如果相应的 id2 与组内的 id1 具有相同的值，则对权重求和。

data:
id1    id2     weight
1       3        10
1       4        10
1       7        10
1       8        10
2       1        10
2       5        10
3       2        10
4       3        10
4       6        10
5       3        10
6       4        10
7       2        10
8       1        10

result:
group(1)
id1    id2     weight   selected
1       3        10       Yes (Because group1 has 1,2,3 and id1 is 1 and id2 is 3)
1       4        10       No
1       7        10       No
1       8        10       No 
2       1        10       Yes (Because group1 has 1,2,3 and id1 is 2 and id2 is 1)
2       5        10       No 
3       2        10       Yes (Because group1 has 1,2,3 and id1 is 3 and id2 is 2)
Weight = 30

group(2)
4       3        10     No
4       6        10     Yes (Because group2 has 4,5,6 and id1 is 4 and id2 is 6)
5       3        10     No
6       4        10     Yes (Because group2 has 4,5,6 and id1 is 4 and id2 is 4)
Weight=20

group(3)
7       2        10     No
8       1        10     No

等等。

我曾尝试使用 grouppixels、sortrows 来实现权重得分并根据另一列查找列值，但我发现在分组时遇到困难。

Answer 1

您可以像这样使用 ismember 函数：

首先，确定唯一的 id1 值

id1 = data(:, 1);
unique_id1 = unique(id1, 'stable');

然后，循环遍历 3 个一组的唯一 ID，并从 data 中提取与三个 ID 中的任何一个匹配的行。

weights = [];
groups = cell(0);
for ii = 1:3:length(unique_id1) - 2
    % Pull out just the id1 values in this group
    ids_in_group = unique_id1(ii:ii+2);
    
    % answer has 1 if id1 is in ids_in_group, 0 otherwise
    select_row_for_group = ismember(id1, ids_in_group);

    % Logical indexing, select only rows with 1 in select_row_for_group
    group_data = data(select_row_for_group, :);

    % Append new group to our cell array
    groups{end+1} = group_data;

    % Select a row in the group for weight calculation if its id2 is in ids_in_group
    select_row_for_weight = ismember(group_data(:, 2), ids_in_group);

    % Select only the weights we want
    selected_weights = group_data(select_row_for_weight, 3);

    % Sum the selected weights
    weightsum = sum(selected_weights);

    % Append to weights array
    weights(end+1) = weightsum; 
end

现在你有：

>> groups{1}

ans =

     1     3    10
     1     4    10
     1     7    10
     1     8    10
     2     1    10
     2     5    10
     3     2    10

>> groups{2}

ans =

     4     3    10
     4     6    10
     5     3    10
     6     4    10

>> groups{3}

ans =

     7     2    10
     8     1    10

>> weights

weights =

    30    20     0

如何使用 MATLAB 基于另一列对单列中的数据进行分组

How to group data in a single column based on another column using MATLAB

sorting

matlab

grouping

vector

multiple-columns