如何有效地计算两部电影排行榜的相似度?
How to calculate the similarity of two movies ranking list effciently?
Problem description:
You and Peter are talking about n
movies, which are represented by integers [1,n]
. You have made a ranking list for the movies according to your preference. Now, Peter tells you his ranking list. You want to know how similar your and Peter's tastes are. For 2 movies i, j
, if you and Peter both rank movie i
before movie j
, You will get11 similarity. Please output the total similarity.
我知道我可以用蛮力的方式解决这个问题。它的Java代码是这样的:
int n = in.nextInt();
int[] rankingListOfMe = new int[n];
int[] rankingListOfPeter = new int[n];
int[] peterMovie2RankingIndex = new int[n + 1];
for (int i = 0; i < n; ++i) rankingListOfMe[i] = in.nextInt();
for (int i = 0; i < n; ++i) {
rankingListOfPeter[i] = in.nextInt();
peterMovie2RankingIndex[rankingListOfPeter[i]] = i;
}
long similarity = 0L;
for (int i = 1; i < n; ++i) {
if (rankingListOfMe[i] == rankingListOfPeter[0]) continue;
int curJMovieIndex = peterMovie2RankingIndex[rankingListOfMe[i]];
for (int j = 0; j < i; ++j) {
if (peterMovie2RankingIndex[rankingListOfMe[j]] < curJMovieIndex) similarity++;
}
}
数组rankingListOfXX
是索引为电影排名的数组,存放电影id。 ArraypeterMovie2RankingIndex
为数组,长度为n+1,索引为电影id,存储对应的电影排名,方便通过电影id获取电影排名。每次我遍历一个电影id,我只是统计有多少部电影满足请求。虽然这种方式可以解决问题,但是不知道有没有其他方式可以更高效的解决。上面算法的时间复杂度是O(n^2)
,对我来说太多了。想了半天,不知道优化哪里。我认为它与排序算法有关,但我不知道如何使用排序算法来解决这个问题。
public void similarity(int[] me, int[] peter){
int[] peterTemp = new int[peter.length];
Map<Integer, Integer> map = new HashMap<>();
for(int i = 0; i < me.length; i++){
map.put(me[i], i);
}
for(int i = 0; i < peter.length; i++){
peterTemp[peterTemp.length - (i + 1)] = map.get(peter[i]);
}
// as David Eisenstat pointed out we are going to count inversion in array, invCount method copied from here
//
System.out.println(invCount(peterTemp));
}
long merge(int[] arr, int[] left, int[] right) {
int i = 0, j = 0, count = 0;
while (i < left.length || j < right.length) {
if (i == left.length) {
arr[i+j] = right[j];
j++;
} else if (j == right.length) {
arr[i+j] = left[i];
i++;
} else if (left[i] <= right[j]) {
arr[i+j] = left[i];
i++;
} else {
arr[i+j] = right[j];
count += left.length-i;
j++;
}
}
return count;
}
long invCount(int[] arr) {
if (arr.length < 2)
return 0;
int m = (arr.length + 1) / 2;
int left[] = Arrays.copyOfRange(arr, 0, m);
int right[] = Arrays.copyOfRange(arr, m, arr.length);
return invCount(left) + invCount(right) + merge(arr, left, right);
}
测试:
for {1, 5, 6, 7} and {6, 7, 1, 5} result is 2
for {4, 7, 8, 3, 1, 2} and {3, 8, 7, 1, 2, 4} result is 7
解释:如果我们像下面这样构建peterTemp
:
peterTemp[i] = map.get(peter[i]);
对于每个 i, j
这样 i > j
和 peterTemp[i] > peterTemp[j]
我们发现了相似之处。我像这样 peterTemp[peterTemp.length - (i + 1)] = map.get(peter[i]);
构建了 peterTemp
以仅使用引用的代码。
Problem description:
You and Peter are talking about
n
movies, which are represented by integers[1,n]
. You have made a ranking list for the movies according to your preference. Now, Peter tells you his ranking list. You want to know how similar your and Peter's tastes are. For 2 moviesi, j
, if you and Peter both rank moviei
before moviej
, You will get11 similarity. Please output the total similarity.
我知道我可以用蛮力的方式解决这个问题。它的Java代码是这样的:
int n = in.nextInt();
int[] rankingListOfMe = new int[n];
int[] rankingListOfPeter = new int[n];
int[] peterMovie2RankingIndex = new int[n + 1];
for (int i = 0; i < n; ++i) rankingListOfMe[i] = in.nextInt();
for (int i = 0; i < n; ++i) {
rankingListOfPeter[i] = in.nextInt();
peterMovie2RankingIndex[rankingListOfPeter[i]] = i;
}
long similarity = 0L;
for (int i = 1; i < n; ++i) {
if (rankingListOfMe[i] == rankingListOfPeter[0]) continue;
int curJMovieIndex = peterMovie2RankingIndex[rankingListOfMe[i]];
for (int j = 0; j < i; ++j) {
if (peterMovie2RankingIndex[rankingListOfMe[j]] < curJMovieIndex) similarity++;
}
}
数组rankingListOfXX
是索引为电影排名的数组,存放电影id。 ArraypeterMovie2RankingIndex
为数组,长度为n+1,索引为电影id,存储对应的电影排名,方便通过电影id获取电影排名。每次我遍历一个电影id,我只是统计有多少部电影满足请求。虽然这种方式可以解决问题,但是不知道有没有其他方式可以更高效的解决。上面算法的时间复杂度是O(n^2)
,对我来说太多了。想了半天,不知道优化哪里。我认为它与排序算法有关,但我不知道如何使用排序算法来解决这个问题。
public void similarity(int[] me, int[] peter){
int[] peterTemp = new int[peter.length];
Map<Integer, Integer> map = new HashMap<>();
for(int i = 0; i < me.length; i++){
map.put(me[i], i);
}
for(int i = 0; i < peter.length; i++){
peterTemp[peterTemp.length - (i + 1)] = map.get(peter[i]);
}
// as David Eisenstat pointed out we are going to count inversion in array, invCount method copied from here
//
System.out.println(invCount(peterTemp));
}
long merge(int[] arr, int[] left, int[] right) {
int i = 0, j = 0, count = 0;
while (i < left.length || j < right.length) {
if (i == left.length) {
arr[i+j] = right[j];
j++;
} else if (j == right.length) {
arr[i+j] = left[i];
i++;
} else if (left[i] <= right[j]) {
arr[i+j] = left[i];
i++;
} else {
arr[i+j] = right[j];
count += left.length-i;
j++;
}
}
return count;
}
long invCount(int[] arr) {
if (arr.length < 2)
return 0;
int m = (arr.length + 1) / 2;
int left[] = Arrays.copyOfRange(arr, 0, m);
int right[] = Arrays.copyOfRange(arr, m, arr.length);
return invCount(left) + invCount(right) + merge(arr, left, right);
}
测试:
for {1, 5, 6, 7} and {6, 7, 1, 5} result is 2
for {4, 7, 8, 3, 1, 2} and {3, 8, 7, 1, 2, 4} result is 7
解释:如果我们像下面这样构建peterTemp
:
peterTemp[i] = map.get(peter[i]);
对于每个 i, j
这样 i > j
和 peterTemp[i] > peterTemp[j]
我们发现了相似之处。我像这样 peterTemp[peterTemp.length - (i + 1)] = map.get(peter[i]);
构建了 peterTemp
以仅使用引用的代码。