如何有效地计算两部电影排行榜的相似度?

How to calculate the similarity of two movies ranking list effciently?

Problem description:

You and Peter are talking about n movies, which are represented by integers [1,n]. You have made a ranking list for the movies according to your preference. Now, Peter tells you his ranking list. You want to know how similar your and Peter's tastes are. For 2 movies i, j, if you and Peter both rank movie i before movie j, You will get11 similarity. Please output the total similarity.

我知道我可以用蛮力的方式解决这个问题。它的Java代码是这样的:

int n = in.nextInt();
int[] rankingListOfMe = new int[n];
int[] rankingListOfPeter = new int[n];
int[] peterMovie2RankingIndex = new int[n + 1];
for (int i = 0; i < n; ++i) rankingListOfMe[i] = in.nextInt();
for (int i = 0; i < n; ++i) {
    rankingListOfPeter[i] = in.nextInt();
    peterMovie2RankingIndex[rankingListOfPeter[i]] = i;
}
long similarity = 0L;
for (int i = 1; i < n; ++i) {
    if (rankingListOfMe[i] == rankingListOfPeter[0]) continue;
    int curJMovieIndex = peterMovie2RankingIndex[rankingListOfMe[i]];
    for (int j = 0; j < i; ++j) {
        if (peterMovie2RankingIndex[rankingListOfMe[j]] < curJMovieIndex) similarity++;
    }
}

数组rankingListOfXX是索引为电影排名的数组,存放电影id。 ArraypeterMovie2RankingIndex为数组,长度为n+1,索引为电影id,存储对应的电影排名,方便通过电影id获取电影排名。每次我遍历一个电影id,我只是统计有多少部电影满足请求。虽然这种方式可以解决问题,但是不知道有没有其他方式可以更高效的解决。上面算法的时间复杂度是O(n^2),对我来说太多了。想了半天,不知道优化哪里。我认为它与排序算法有关,但我不知道如何使用排序算法来解决这个问题。

public void similarity(int[] me, int[] peter){
    int[] peterTemp = new int[peter.length];
    Map<Integer, Integer> map = new HashMap<>();
    for(int i = 0; i < me.length; i++){
        map.put(me[i], i);
    }
    for(int i = 0; i < peter.length; i++){ 
        peterTemp[peterTemp.length - (i + 1)] = map.get(peter[i]);
    }

    // as David Eisenstat pointed out we are going to count inversion in array, invCount method copied from here
    // 
    System.out.println(invCount(peterTemp));
}

long merge(int[] arr, int[] left, int[] right) {
    int i = 0, j = 0, count = 0;
    while (i < left.length || j < right.length) {
        if (i == left.length) {
            arr[i+j] = right[j];
            j++;
        } else if (j == right.length) {
            arr[i+j] = left[i];
            i++;
        } else if (left[i] <= right[j]) {
            arr[i+j] = left[i];
            i++;
        } else {
            arr[i+j] = right[j];
            count += left.length-i;
            j++;
        }
    }
    return count;
}

long invCount(int[] arr) {
    if (arr.length < 2)
        return 0;

    int m = (arr.length + 1) / 2;
    int left[] = Arrays.copyOfRange(arr, 0, m);
    int right[] = Arrays.copyOfRange(arr, m, arr.length);

    return invCount(left) + invCount(right) + merge(arr, left, right);
}

测试:

for {1, 5, 6, 7} and  {6, 7, 1, 5} result is 2
for {4, 7, 8, 3, 1, 2} and {3, 8, 7, 1, 2, 4} result is 7

解释:‌如果我们像下面这样构建peterTemp

peterTemp[i] = map.get(peter[i]);

对于每个 i, j 这样 i > jpeterTemp[i] >‌ peterTemp[j] 我们发现了相似之处。我像这样 peterTemp[peterTemp.length - (i + 1)] = map.get(peter[i]); 构建了 peterTemp 以仅使用引用的代码。