RavenDB:如何在 map-reduce 中正确索引笛卡尔积?

RavenDB: How can I properly index a cartesian product in a map-reduce?

这个问题是 的衍生问题,但我意识到,问题是另一个问题。

考虑我极其简化的域,重写为电影租赁店场景进行抽象:

public class User
{
    public string Id { get; set; }
}

public class Movie
{
    public string Id { get; set; }
}

public class MovieRental
{
    public string Id { get; set; }
    public string MovieId { get; set; }
    public string UserId { get; set; }
}

这是教科书上的多对多示例。

我要创建的索引是这样的:

For a given user, give me a list of every movie in the database (filtering/search left out for the moment) along with an integer describing how many times (or zero) the user has rented this movie.

基本上是这样的:

用户:

| Id     |
|--------|
| John   |
| Lizzie |
| Albert |

电影:

| Id           |
|--------------|
| Robocop      |
| Notting Hill |
| Inception    |

电影租赁:

| Id        | UserId | MovieId      |
|-----------|--------|--------------|
| rental-00 | John   | Robocop      |
| rental-01 | John   | Notting Hill |
| rental-02 | John   | Notting Hill |
| rental-03 | Lizzie | Robocop      |
| rental-04 | Lizzie | Robocop      |
| rental-05 | Lizzie | Inception    |

理想情况下,我想要一个索引来查询,它看起来像这样:

| UserId | MovieId      | RentalCount |
|--------|--------------|-------------|
| John   | Robocop      | 1           |
| John   | Notting Hill | 2           |
| John   | Inception    | 0           |
| Lizzie | Robocop      | 2           |
| Lizzie | Notting Hill | 0           |
| Lizzie | Inception    | 1           |
| Albert | Robocop      | 0           |
| Albert | Notting Hill | 0           |
| Albert | Inception    | 0           |

或声明式:

但是,我找不到制作上面的"cross-join"并将其保存在索引中的方法。相反,我最初认为我用下面的这个操作做对了,但它不允许我排序(参见失败的测试):

{"Not supported computation: x.UserRentalCounts.SingleOrDefault(rentalCount => (rentalCount.UserId == value(UnitTestProject2.MovieRentalTests+<>c__DisplayClass0_0).user_john.Id)).Count. You cannot use computation in RavenDB queries (only simple member expressions are allowed)."}

我的问题基本上是:我如何 - 或者我完全可以 - 索引以便满足我的要求?


下面是我提到的例子,它不符合我的要求,但这就是我现在的情况。它使用以下包(VS2015):

packages.config

<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="Microsoft.Owin.Host.HttpListener" version="3.0.1" targetFramework="net461" />
  <package id="NUnit" version="3.5.0" targetFramework="net461" />
  <package id="RavenDB.Client" version="3.5.2" targetFramework="net461" />
  <package id="RavenDB.Database" version="3.5.2" targetFramework="net461" />
  <package id="RavenDB.Tests.Helpers" version="3.5.2" targetFramework="net461" />
</packages>

MovieRentalTests.cs

using System.Collections.Generic;
using System.Linq;
using NUnit.Framework;
using Raven.Client.Indexes;
using Raven.Client.Linq;
using Raven.Tests.Helpers;

namespace UnitTestProject2
{
    [TestFixture]
    public class MovieRentalTests : RavenTestBase
    {
        [Test]
        public void DoSomeTests()
        {
            using (var server = GetNewServer())
            using (var store = NewRemoteDocumentStore(ravenDbServer: server))
            {
                //Test-data
                var user_john = new User { Id = "John" };
                var user_lizzie = new User { Id = "Lizzie" };
                var user_albert = new User { Id = "Albert" };


                var movie_robocop = new Movie { Id = "Robocop" };
                var movie_nottingHill = new Movie { Id = "Notting Hill" };
                var movie_inception = new Movie { Id = "Inception" };

                var rentals = new List<MovieRental>
                {
                    new MovieRental {Id = "rental-00", UserId = user_john.Id, MovieId = movie_robocop.Id},
                    new MovieRental {Id = "rental-01", UserId = user_john.Id, MovieId = movie_nottingHill.Id},
                    new MovieRental {Id = "rental-02", UserId = user_john.Id, MovieId = movie_nottingHill.Id},
                    new MovieRental {Id = "rental-03", UserId = user_lizzie.Id, MovieId = movie_robocop.Id},
                    new MovieRental {Id = "rental-04", UserId = user_lizzie.Id, MovieId = movie_robocop.Id},
                    new MovieRental {Id = "rental-05", UserId = user_lizzie.Id, MovieId = movie_inception.Id}
                };

                //Init index
                new Movies_WithRentalsByUsersCount().Execute(store);

                //Insert test-data in db
                using (var session = store.OpenSession())
                {
                    session.Store(user_john);
                    session.Store(user_lizzie);
                    session.Store(user_albert);

                    session.Store(movie_robocop);
                    session.Store(movie_nottingHill);
                    session.Store(movie_inception);

                    foreach (var rental in rentals)
                    {
                        session.Store(rental);
                    }

                    session.SaveChanges();

                    WaitForAllRequestsToComplete(server);
                    WaitForIndexing(store);
                }

                //Test of correct rental-counts for users
                using (var session = store.OpenSession())
                {
                    var allMoviesWithRentalCounts =
                        session.Query<Movies_WithRentalsByUsersCount.ReducedResult, Movies_WithRentalsByUsersCount>()
                            .ToList();

                    var robocopWithRentalsCounts = allMoviesWithRentalCounts.Single(m => m.MovieId == movie_robocop.Id);
                    Assert.AreEqual(1, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_john.Id)?.Count ?? 0);
                    Assert.AreEqual(2, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_lizzie.Id)?.Count ?? 0);
                    Assert.AreEqual(0, robocopWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_albert.Id)?.Count ?? 0);

                    var nottingHillWithRentalsCounts = allMoviesWithRentalCounts.Single(m => m.MovieId == movie_nottingHill.Id);
                    Assert.AreEqual(2, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_john.Id)?.Count ?? 0);
                    Assert.AreEqual(0, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_lizzie.Id)?.Count ?? 0);
                    Assert.AreEqual(0, nottingHillWithRentalsCounts.UserRentalCounts.FirstOrDefault(x => x.UserId == user_albert.Id)?.Count ?? 0);
                }

                // Test that you for a given user can sort the movies by view-count
                using (var session = store.OpenSession())
                {
                    var allMoviesWithRentalCounts =
                        session.Query<Movies_WithRentalsByUsersCount.ReducedResult, Movies_WithRentalsByUsersCount>()
                            .OrderByDescending(x => x.UserRentalCounts.SingleOrDefault(rentalCount => rentalCount.UserId == user_john.Id).Count)
                            .ToList();

                    Assert.AreEqual(movie_nottingHill.Id, allMoviesWithRentalCounts[0].MovieId);
                    Assert.AreEqual(movie_robocop.Id, allMoviesWithRentalCounts[1].MovieId);
                    Assert.AreEqual(movie_inception.Id, allMoviesWithRentalCounts[2].MovieId);
                }
            }
        }

        public class Movies_WithRentalsByUsersCount :
            AbstractMultiMapIndexCreationTask<Movies_WithRentalsByUsersCount.ReducedResult>
        {
            public Movies_WithRentalsByUsersCount()
            {
                AddMap<MovieRental>(rentals =>
                    from r in rentals
                    select new ReducedResult
                    {
                        MovieId = r.MovieId,
                        UserRentalCounts = new[] { new UserRentalCount { UserId = r.UserId, Count = 1 } }
                    });

                AddMap<Movie>(movies =>
                    from m in movies
                    select new ReducedResult
                    {
                        MovieId = m.Id,
                        UserRentalCounts = new[] { new UserRentalCount { UserId = null, Count = 0 } }
                    });

                Reduce = results =>
                    from result in results
                    group result by result.MovieId
                    into g
                    select new
                    {
                        MovieId = g.Key,
                        UserRentalCounts = (
                                from userRentalCount in g.SelectMany(x => x.UserRentalCounts)
                                group userRentalCount by userRentalCount.UserId
                                into subGroup
                                select new UserRentalCount { UserId = subGroup.Key, Count = subGroup.Sum(b => b.Count) })
                            .ToArray()
                    };
            }

            public class ReducedResult
            {
                public string MovieId { get; set; }
                public UserRentalCount[] UserRentalCounts { get; set; }
            }

            public class UserRentalCount
            {
                public string UserId { get; set; }
                public int Count { get; set; }
            }
        }

        public class User
        {
            public string Id { get; set; }
        }

        public class Movie
        {
            public string Id { get; set; }
        }

        public class MovieRental
        {
            public string Id { get; set; }
            public string MovieId { get; set; }
            public string UserId { get; set; }
        }
    }
}

由于您的要求是 "for a given user",如果您真的只想寻找单个用户,则可以使用 Multi-Map 索引来实现。使用 Movies table 本身生成基线 zero-count 记录,然后在其上为用户映射实际的 MovieRentals 记录。

如果你真的需要它来满足所有看过所有电影的用户,我认为没有办法用 RavenDB 干净地做到这一点,因为这会被认为是 reporting which is noted as one of the sour spots for RavenDB

如果您真的想尝试使用 RavenDB 执行此操作,这里有一些选项:

1) 在数据库中为每个用户和每部电影创建虚拟记录,并在索引中使用这些记录,计数为 0。每当电影或用户 added/updated/deleted 时,相应地更新虚拟记录。

2) 根据请求在内存中生成自己的 zero-count 记录,并将该数据与 RavenDB 返回给您的 non-zero 计数的数据合并。查询所有用户,查询所有电影,创建基线 zero-count 记录,然后对 non-zero 计数进行实际查询并将其放在最上面。最后,应用 paging/filtering/sorting 逻辑。

3) 使用 SQL 复制包将用户、电影和 MovieRental table 复制到 SQL 并为此使用 SQL "reporting" 查询.