如何在 Matlab 中向量化搜索函数和交集？

Question

这是一个 Matlab 编码问题（与 setdiff 不相交的小不同版本）：

一个有3列的评分矩阵A，第1列是可能重复的用户ID，第2列是可能重复的项目ID，第3列是用户对项目的评分，范围从1到5。

现在，我有一个用户 ID 的子集 smallUserIDList 和一个项目 ID 的子集 smallItemIDList，然后我想在 smallUserIDList 中找到 A 中用户评分的行，以及收集用户评分的项目，并做一些计算，比如与smallItemIDList相交并统计结果，如下代码：

userStat = zeros(length(smallUserIDList), 1);
for i = 1:length(smallUserIDList)
    A2= A(A(:,1) == smallUserIDList(i), :);
    itemIDList_each = unique(A2(:,2));

    setIntersect = intersect(itemIDList_each , smallItemIDList);
    userStat(i) = length(setIntersect);
end
userStat

最后，我发现 profile viewer 显示上面的循环效率低下，问题是如何通过矢量化改进这段代码，但有 for 循环的帮助？

例如：

输入:

A = [
1 11 1
2 22 2
2 66 4
4 44 5
6 66 5
7 11 5
7 77 5
8 11 2
8 22 3
8 44 3
8 66 4
8 77 5    
]

smallUserIDList = [1 2 7 8]
smallItemIDList = [11 22 33 55 77]

输出:

userStat =

 1
 1
 2
 3

Answer 1

啊！您需要对上一个问题的已接受解决方案进行微小的修改。这是解决方案 -

[R,C] = find(bsxfun(@eq,A(:,1),smallUserIDList(:).')); %//'
mask = ismember(A(R,2),smallItemIDList(:).'); %//'# The edit was needed here

ARm = A(R,2);
Cm = C(mask);
ARm = ARm(mask);

userStat = zeros(numel(smallUserIDList),1);
if ~isempty(Cm)
    dup_counts = accumarray(Cm,ARm,[],@(x) numel(x)-numel(unique(x)));
    accums = accumarray(C,mask);
    userStat(1:numel(accums)) = accums;
    userStat(1:numel(dup_counts)) = userStat(1:numel(dup_counts)) - dup_counts;
end

作为奖励，您可以编辑预分配步骤 -

userStat = zeros(numel(smallUserIDList),1);

使用这种更快的预分配方案 -

userStat(1,numel(smallUserIDList)) = 0;

在此 MATLAB Undocumented post on Pre-allocation 中阅读更多相关信息。

如何在 Matlab 中向量化搜索函数和交集？

How to vectorize searching function and Intersection in Matlab?

optimization

matlab

matrix

vectorization