RavenDB:如何在 map-reduce 中正确索引笛卡尔积?
RavenDB: How can I properly index a cartesian product in a map-reduce?
这个问题是 的衍生问题,但我意识到,问题是另一个问题。
考虑我极其简化的域,重写为电影租赁店场景进行抽象:
public class User
{
public string Id { get; set; }
}
public class Movie
{
public string Id { get; set; }
}
public class MovieRental
{
public string Id { get; set; }
public string MovieId { get; set; }
public string UserId { get; set; }
}
这是教科书上的多对多示例。
我要创建的索引是这样的:
For a given user, give me a list of every movie in the database (filtering/search left out for the moment) along with an integer describing how many times (or zero) the user has rented this movie.
基本上是这样的:
用户:
| Id |
|--------|
| John |
| Lizzie |
| Albert |
电影:
| Id |
|--------------|
| Robocop |
| Notting Hill |
| Inception |
电影租赁:
| Id | UserId | MovieId |
|-----------|--------|--------------|
| rental-00 | John | Robocop |
| rental-01 | John | Notting Hill |
| rental-02 | John | Notting Hill |
| rental-03 | Lizzie | Robocop |
| rental-04 | Lizzie | Robocop |
| rental-05 | Lizzie | Inception |
理想情况下,我想要一个索引来查询,它看起来像这样:
| UserId | MovieId | RentalCount |
|--------|--------------|-------------|
| John | Robocop | 1 |
| John | Notting Hill | 2 |
| John | Inception | 0 |
| Lizzie | Robocop | 2 |
| Lizzie | Notting Hill | 0 |
| Lizzie | Inception | 1 |
| Albert | Robocop | 0 |
| Albert | Notting Hill | 0 |
| Albert | Inception | 0 |
或声明式:
- 我一直想要所有电影的完整列表(最终我会添加 filtering/searching)- 即使向从未租过一部电影的用户提供
- 我想要计算每个用户的租金,只是整数
- 我希望能够按租借次数排序 - 即在列表顶部显示给定用户租借最多的电影
但是,我找不到制作上面的"cross-join"并将其保存在索引中的方法。相反,我最初认为我用下面的这个操作做对了,但它不允许我排序(参见失败的测试):
{"Not supported computation: x.UserRentalCounts.SingleOrDefault(rentalCount => (rentalCount.UserId == value(UnitTestProject2.MovieRentalTests+<>c__DisplayClass0_0).user_john.Id)).Count. You cannot use computation in RavenDB queries (only simple member expressions are allowed)."}
我的问题基本上是:我如何 - 或者我完全可以 - 索引以便满足我的要求?
下面是我提到的例子,它不符合我的要求,但这就是我现在的情况。它使用以下包(VS2015):
packages.config
<?xml version="1.0" encoding="utf-8"?>
<packages>
<package id="Microsoft.Owin.Host.HttpListener" version="3.0.1" targetFramework="net461" />
<package id="NUnit" version="3.5.0" targetFramework="net461" />
<package id="RavenDB.Client" version="3.5.2" targetFramework="net461" />
<package id="RavenDB.Database" version="3.5.2" targetFramework="net461" />
<package id="RavenDB.Tests.Helpers" version="3.5.2" targetFramework="net461" />
</packages>
MovieRentalTests.cs
using System.Collections.Generic;
using System.Linq;
using NUnit.Framework;
using Raven.Client.Indexes;
using Raven.Client.Linq;
using Raven.Tests.Helpers;
namespace UnitTestProject2
{
[TestFixture]
public class MovieRentalTests : RavenTestBase
{
[Test]
public void DoSomeTests()
{
using (var server = GetNewServer())
using (var store = NewRemoteDocumentStore(ravenDbServer: server))
{
//Test-data
var user_john = new User { Id = "John" };
var user_lizzie = new User { Id = "Lizzie" };
var user_albert = new User { Id = "Albert" };
var movie_robocop = new Movie { Id = "Robocop" };
var movie_nottingHill = new Movie { Id = "Notting Hill" };
var movie_inception = new Movie { Id = "Inception" };
var rentals = new List<MovieRental>
{
new MovieRental {Id = "rental-00", UserId = user_john.Id, MovieId = movie_robocop.Id},
new MovieRental {Id = "rental-01", UserId = user_john.Id, MovieId = movie_nottingHill.Id},
new MovieRental {Id = "rental-02", UserId = user_john.Id, MovieId = movie_nottingHill.Id},
new MovieRental {Id = "rental-03", UserId = user_lizzie.Id, MovieId = movie_robocop.Id},
new MovieRental {Id = "rental-04", UserId = user_lizzie.Id, MovieId = movie_robocop.Id},
new MovieRental {Id = "rental-05", UserId = user_lizzie.Id, MovieId = movie_inception.Id}
};
//Init index
new Movies_WithRentalsByUsersCount().Execute(store);
//Insert test-data in db
using (var session = store.OpenSession())
{
session.Store(user_john);
session.Store(user_lizzie);
session.Store(user_albert);
session.Store(movie_robocop);
session.Store(movie_nottingHill);
session.Store(movie_inception);
foreach (var rental in rentals)
{
session.Store(rental);
}
session.SaveChanges();
WaitForAllRequestsToComplete(server);
WaitForIndexing(store);
}
//Test of correct rental-counts for users
using (var session = store.OpenSession())
{
var allMoviesWithRentalCounts =
session.Query<Movies_WithRentalsByUsersCount.ReducedResult, Movies_WithRentalsByUsersCount>()
.ToList();
var robocopWithRentalsCounts = allMoviesWithRentalCounts.Single(m => m.MovieId == movie_robocop.Id);
Assert.AreEqual(1, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_john.Id)?.Count ?? 0);
Assert.AreEqual(2, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_lizzie.Id)?.Count ?? 0);
Assert.AreEqual(0, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_albert.Id)?.Count ?? 0);
var nottingHillWithRentalsCounts = allMoviesWithRentalCounts.Single(m => m.MovieId == movie_nottingHill.Id);
Assert.AreEqual(2, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_john.Id)?.Count ?? 0);
Assert.AreEqual(0, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_lizzie.Id)?.Count ?? 0);
Assert.AreEqual(0, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_albert.Id)?.Count ?? 0);
}
// Test that you for a given user can sort the movies by view-count
using (var session = store.OpenSession())
{
var allMoviesWithRentalCounts =
session.Query<Movies_WithRentalsByUsersCount.ReducedResult, Movies_WithRentalsByUsersCount>()
.OrderByDescending(x => x.UserRentalCounts.SingleOrDefault(rentalCount => rentalCount.UserId == user_john.Id).Count)
.ToList();
Assert.AreEqual(movie_nottingHill.Id, allMoviesWithRentalCounts[0].MovieId);
Assert.AreEqual(movie_robocop.Id, allMoviesWithRentalCounts[1].MovieId);
Assert.AreEqual(movie_inception.Id, allMoviesWithRentalCounts[2].MovieId);
}
}
}
public class Movies_WithRentalsByUsersCount :
AbstractMultiMapIndexCreationTask<Movies_WithRentalsByUsersCount.ReducedResult>
{
public Movies_WithRentalsByUsersCount()
{
AddMap<MovieRental>(rentals =>
from r in rentals
select new ReducedResult
{
MovieId = r.MovieId,
UserRentalCounts = new[] { new UserRentalCount { UserId = r.UserId, Count = 1 } }
});
AddMap<Movie>(movies =>
from m in movies
select new ReducedResult
{
MovieId = m.Id,
UserRentalCounts = new[] { new UserRentalCount { UserId = null, Count = 0 } }
});
Reduce = results =>
from result in results
group result by result.MovieId
into g
select new
{
MovieId = g.Key,
UserRentalCounts = (
from userRentalCount in g.SelectMany(x => x.UserRentalCounts)
group userRentalCount by userRentalCount.UserId
into subGroup
select new UserRentalCount { UserId = subGroup.Key, Count = subGroup.Sum(b => b.Count) })
.ToArray()
};
}
public class ReducedResult
{
public string MovieId { get; set; }
public UserRentalCount[] UserRentalCounts { get; set; }
}
public class UserRentalCount
{
public string UserId { get; set; }
public int Count { get; set; }
}
}
public class User
{
public string Id { get; set; }
}
public class Movie
{
public string Id { get; set; }
}
public class MovieRental
{
public string Id { get; set; }
public string MovieId { get; set; }
public string UserId { get; set; }
}
}
}
由于您的要求是 "for a given user",如果您真的只想寻找单个用户,则可以使用 Multi-Map 索引来实现。使用 Movies table 本身生成基线 zero-count 记录,然后在其上为用户映射实际的 MovieRentals 记录。
如果你真的需要它来满足所有看过所有电影的用户,我认为没有办法用 RavenDB 干净地做到这一点,因为这会被认为是 reporting which is noted as one of the sour spots for RavenDB。
如果您真的想尝试使用 RavenDB 执行此操作,这里有一些选项:
1) 在数据库中为每个用户和每部电影创建虚拟记录,并在索引中使用这些记录,计数为 0。每当电影或用户 added/updated/deleted 时,相应地更新虚拟记录。
2) 根据请求在内存中生成自己的 zero-count 记录,并将该数据与 RavenDB 返回给您的 non-zero 计数的数据合并。查询所有用户,查询所有电影,创建基线 zero-count 记录,然后对 non-zero 计数进行实际查询并将其放在最上面。最后,应用 paging/filtering/sorting 逻辑。
3) 使用 SQL 复制包将用户、电影和 MovieRental table 复制到 SQL 并为此使用 SQL "reporting" 查询.
这个问题是
考虑我极其简化的域,重写为电影租赁店场景进行抽象:
public class User
{
public string Id { get; set; }
}
public class Movie
{
public string Id { get; set; }
}
public class MovieRental
{
public string Id { get; set; }
public string MovieId { get; set; }
public string UserId { get; set; }
}
这是教科书上的多对多示例。
我要创建的索引是这样的:
For a given user, give me a list of every movie in the database (filtering/search left out for the moment) along with an integer describing how many times (or zero) the user has rented this movie.
基本上是这样的:
用户:
| Id |
|--------|
| John |
| Lizzie |
| Albert |
电影:
| Id |
|--------------|
| Robocop |
| Notting Hill |
| Inception |
电影租赁:
| Id | UserId | MovieId |
|-----------|--------|--------------|
| rental-00 | John | Robocop |
| rental-01 | John | Notting Hill |
| rental-02 | John | Notting Hill |
| rental-03 | Lizzie | Robocop |
| rental-04 | Lizzie | Robocop |
| rental-05 | Lizzie | Inception |
理想情况下,我想要一个索引来查询,它看起来像这样:
| UserId | MovieId | RentalCount |
|--------|--------------|-------------|
| John | Robocop | 1 |
| John | Notting Hill | 2 |
| John | Inception | 0 |
| Lizzie | Robocop | 2 |
| Lizzie | Notting Hill | 0 |
| Lizzie | Inception | 1 |
| Albert | Robocop | 0 |
| Albert | Notting Hill | 0 |
| Albert | Inception | 0 |
或声明式:
- 我一直想要所有电影的完整列表(最终我会添加 filtering/searching)- 即使向从未租过一部电影的用户提供
- 我想要计算每个用户的租金,只是整数
- 我希望能够按租借次数排序 - 即在列表顶部显示给定用户租借最多的电影
但是,我找不到制作上面的"cross-join"并将其保存在索引中的方法。相反,我最初认为我用下面的这个操作做对了,但它不允许我排序(参见失败的测试):
{"Not supported computation: x.UserRentalCounts.SingleOrDefault(rentalCount => (rentalCount.UserId == value(UnitTestProject2.MovieRentalTests+<>c__DisplayClass0_0).user_john.Id)).Count. You cannot use computation in RavenDB queries (only simple member expressions are allowed)."}
我的问题基本上是:我如何 - 或者我完全可以 - 索引以便满足我的要求?
下面是我提到的例子,它不符合我的要求,但这就是我现在的情况。它使用以下包(VS2015):
packages.config
<?xml version="1.0" encoding="utf-8"?>
<packages>
<package id="Microsoft.Owin.Host.HttpListener" version="3.0.1" targetFramework="net461" />
<package id="NUnit" version="3.5.0" targetFramework="net461" />
<package id="RavenDB.Client" version="3.5.2" targetFramework="net461" />
<package id="RavenDB.Database" version="3.5.2" targetFramework="net461" />
<package id="RavenDB.Tests.Helpers" version="3.5.2" targetFramework="net461" />
</packages>
MovieRentalTests.cs
using System.Collections.Generic;
using System.Linq;
using NUnit.Framework;
using Raven.Client.Indexes;
using Raven.Client.Linq;
using Raven.Tests.Helpers;
namespace UnitTestProject2
{
[TestFixture]
public class MovieRentalTests : RavenTestBase
{
[Test]
public void DoSomeTests()
{
using (var server = GetNewServer())
using (var store = NewRemoteDocumentStore(ravenDbServer: server))
{
//Test-data
var user_john = new User { Id = "John" };
var user_lizzie = new User { Id = "Lizzie" };
var user_albert = new User { Id = "Albert" };
var movie_robocop = new Movie { Id = "Robocop" };
var movie_nottingHill = new Movie { Id = "Notting Hill" };
var movie_inception = new Movie { Id = "Inception" };
var rentals = new List<MovieRental>
{
new MovieRental {Id = "rental-00", UserId = user_john.Id, MovieId = movie_robocop.Id},
new MovieRental {Id = "rental-01", UserId = user_john.Id, MovieId = movie_nottingHill.Id},
new MovieRental {Id = "rental-02", UserId = user_john.Id, MovieId = movie_nottingHill.Id},
new MovieRental {Id = "rental-03", UserId = user_lizzie.Id, MovieId = movie_robocop.Id},
new MovieRental {Id = "rental-04", UserId = user_lizzie.Id, MovieId = movie_robocop.Id},
new MovieRental {Id = "rental-05", UserId = user_lizzie.Id, MovieId = movie_inception.Id}
};
//Init index
new Movies_WithRentalsByUsersCount().Execute(store);
//Insert test-data in db
using (var session = store.OpenSession())
{
session.Store(user_john);
session.Store(user_lizzie);
session.Store(user_albert);
session.Store(movie_robocop);
session.Store(movie_nottingHill);
session.Store(movie_inception);
foreach (var rental in rentals)
{
session.Store(rental);
}
session.SaveChanges();
WaitForAllRequestsToComplete(server);
WaitForIndexing(store);
}
//Test of correct rental-counts for users
using (var session = store.OpenSession())
{
var allMoviesWithRentalCounts =
session.Query<Movies_WithRentalsByUsersCount.ReducedResult, Movies_WithRentalsByUsersCount>()
.ToList();
var robocopWithRentalsCounts = allMoviesWithRentalCounts.Single(m => m.MovieId == movie_robocop.Id);
Assert.AreEqual(1, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_john.Id)?.Count ?? 0);
Assert.AreEqual(2, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_lizzie.Id)?.Count ?? 0);
Assert.AreEqual(0, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_albert.Id)?.Count ?? 0);
var nottingHillWithRentalsCounts = allMoviesWithRentalCounts.Single(m => m.MovieId == movie_nottingHill.Id);
Assert.AreEqual(2, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_john.Id)?.Count ?? 0);
Assert.AreEqual(0, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_lizzie.Id)?.Count ?? 0);
Assert.AreEqual(0, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_albert.Id)?.Count ?? 0);
}
// Test that you for a given user can sort the movies by view-count
using (var session = store.OpenSession())
{
var allMoviesWithRentalCounts =
session.Query<Movies_WithRentalsByUsersCount.ReducedResult, Movies_WithRentalsByUsersCount>()
.OrderByDescending(x => x.UserRentalCounts.SingleOrDefault(rentalCount => rentalCount.UserId == user_john.Id).Count)
.ToList();
Assert.AreEqual(movie_nottingHill.Id, allMoviesWithRentalCounts[0].MovieId);
Assert.AreEqual(movie_robocop.Id, allMoviesWithRentalCounts[1].MovieId);
Assert.AreEqual(movie_inception.Id, allMoviesWithRentalCounts[2].MovieId);
}
}
}
public class Movies_WithRentalsByUsersCount :
AbstractMultiMapIndexCreationTask<Movies_WithRentalsByUsersCount.ReducedResult>
{
public Movies_WithRentalsByUsersCount()
{
AddMap<MovieRental>(rentals =>
from r in rentals
select new ReducedResult
{
MovieId = r.MovieId,
UserRentalCounts = new[] { new UserRentalCount { UserId = r.UserId, Count = 1 } }
});
AddMap<Movie>(movies =>
from m in movies
select new ReducedResult
{
MovieId = m.Id,
UserRentalCounts = new[] { new UserRentalCount { UserId = null, Count = 0 } }
});
Reduce = results =>
from result in results
group result by result.MovieId
into g
select new
{
MovieId = g.Key,
UserRentalCounts = (
from userRentalCount in g.SelectMany(x => x.UserRentalCounts)
group userRentalCount by userRentalCount.UserId
into subGroup
select new UserRentalCount { UserId = subGroup.Key, Count = subGroup.Sum(b => b.Count) })
.ToArray()
};
}
public class ReducedResult
{
public string MovieId { get; set; }
public UserRentalCount[] UserRentalCounts { get; set; }
}
public class UserRentalCount
{
public string UserId { get; set; }
public int Count { get; set; }
}
}
public class User
{
public string Id { get; set; }
}
public class Movie
{
public string Id { get; set; }
}
public class MovieRental
{
public string Id { get; set; }
public string MovieId { get; set; }
public string UserId { get; set; }
}
}
}
由于您的要求是 "for a given user",如果您真的只想寻找单个用户,则可以使用 Multi-Map 索引来实现。使用 Movies table 本身生成基线 zero-count 记录,然后在其上为用户映射实际的 MovieRentals 记录。
如果你真的需要它来满足所有看过所有电影的用户,我认为没有办法用 RavenDB 干净地做到这一点,因为这会被认为是 reporting which is noted as one of the sour spots for RavenDB。
如果您真的想尝试使用 RavenDB 执行此操作,这里有一些选项:
1) 在数据库中为每个用户和每部电影创建虚拟记录,并在索引中使用这些记录,计数为 0。每当电影或用户 added/updated/deleted 时,相应地更新虚拟记录。
2) 根据请求在内存中生成自己的 zero-count 记录,并将该数据与 RavenDB 返回给您的 non-zero 计数的数据合并。查询所有用户,查询所有电影,创建基线 zero-count 记录,然后对 non-zero 计数进行实际查询并将其放在最上面。最后,应用 paging/filtering/sorting 逻辑。
3) 使用 SQL 复制包将用户、电影和 MovieRental table 复制到 SQL 并为此使用 SQL "reporting" 查询.