PostgreSQL 条件连接和结果过滤

PostgreSQL conditional join and result filtering

我想要什么

曾出演多于一部相同作品的剧组成员的姓名 导演的电影(但不是自己导演的)并且自己至少导演了一部电影

select * from moviesdirectors LIMIT 5

前 5 名电影导演行

movieid   director
6       Bob Persichetti
6       Peter Ramsey
6       Rodney Rothman
16      Sergio Leone
20      Spike Jonze

select * from moviescast LIMIT 5

前 5 名电影放映行

movieid castname
427     Quei Tann
2878    Indrans
3272    Togo Igawa
517     Ajay Naidu
1608    Megan Sousa

我尝试了什么:

select * from moviescast LEFT JOIN moviesdirectors ON moviesdirectors.movieid=moviescast.movieid
where castname in (select director from moviesdirectors) AND castname!=director 
ORDER BY castname

如果我对你的问题的理解正确,你想计算演员表中的人数:

  1. 导演了一部或多部电影
  2. 出演过同一导演的不止一部电影,但电影不应将他们作为联合导演。

如果是这种情况,我用

复制了一个与你的相似的数据集
create table moviesdirectors( movieid int, director varchar);
 create table moviescast( movieid int, castname varchar);
 
 insert into moviesdirectors values(6, 'Bob Persichetti');
 insert into moviesdirectors values(6, 'Peter Ramsey');
 insert into moviesdirectors values(6, 'Rodney Rothman');
 insert into moviesdirectors values(10, 'Sergio Leone');
 insert into moviesdirectors values(10, 'Spike Jonze');
 insert into moviesdirectors values(20, 'Spike Jonze');

 insert into moviescast values(6, 'Bob Persichetti');
 insert into moviescast values(20, 'Bob Persichetti');
 insert into moviescast values(10, 'Bob Persichetti');
 insert into moviescast values(6, 'Quei Tann');
 insert into moviescast values(20, 'Quei Tann');
 insert into moviescast values(10, 'Quei Tann');
 insert into moviescast values(6, 'Sergio Leone');
 insert into moviescast values(20, 'Sergio Leone');
 insert into moviescast values(6, 'Peter Ramsey');
 insert into moviescast values(20, 'Peter Ramsey');

table moviesdirectors 现在包含以下内容

 movieid |    director     
---------+-----------------
       6 | Bob Persichetti
       6 | Peter Ramsey
       6 | Rodney Rothman
      10 | Sergio Leone
      10 | Spike Jonze
      20 | Spike Jonze
(6 rows)

和tablemoviescast以下

 movieid |    castname     
---------+-----------------
       6 | Bob Persichetti
      20 | Bob Persichetti
      10 | Bob Persichetti
       6 | Quei Tann
      20 | Quei Tann
      10 | Quei Tann
       6 | Sergio Leone
      20 | Sergio Leone
       6 | Peter Ramsey
      20 | Peter Ramsey
(10 rows)

在上面的数据集中,只有 Bob PersichettiSpike Jonze 工作应该满足最初规定的 2 个条件。

所有其他不应该因为:

  • Quei Tann从未执导过
  • Sergio Leone 出演了两部电影,但导演不同
  • Peter Ramsey出演了两部电影,但只有一部(movieid=6也是他自己导演的)

为了提供解决方案,我将问题拆分为以下查询的各个步骤:

  1. directed_by_themselves:检索 movieid 的列表,其中 director 也是演员阵容的一部分
  2. directors:检索不同的董事列表
  3. 最后一个查询将所有内容粘合起来,并从生成的数据集中删除演员也是导演的所有电影
with directed_by_themselves as (
  select distinct moviesdirectors.movieid, 
         moviesdirectors.director 
  from moviesdirectors join moviescast 
  on moviesdirectors.movieid = moviescast.movieid 
  and moviesdirectors.director = moviescast.castname
 ),
 directors as (
 select distinct director from moviesdirectors) 
 
 select d.director, c.castname, count(*) nr_movies 
 from moviesdirectors d join moviescast c
 on (d.movieid = c.movieid)
 join directors dir on c.castname = dir.director
 where (d.movieid,c.castname) not in (select movieid,director from directed_by_themselves)
 group by d.director, c.castname
 having count(*) > 1;

结果是

  director   |    castname     | nr_movies 
-------------+-----------------+-----------
 Spike Jonze | Bob Persichetti |         2
(1 row)