计数(不同)来自连接表 returns duplicate/incorrect 个值

count(distinct) from joined tables returns duplicate/incorrect values

SQL:

SELECT COUNT(DISTINCT person.p_id) AS numberOfPeople, 
location.l_id AS location
FROM job
INNER JOIN person ON job.j_person = person.p_id
INNER JOIN (location INNER JOIN area ON location.l_area = area.a_id) ON job.j_location = location.l_id
GROUP BY area.a_name, location.l_name

数据库: 'job' table 与 'person' (在 j_person = p_id) 和 'location' (在 j_location = l_id)

Table: person (list of all people in the company, PK = p_id)
+------+--------+--
| p_id | p_name | etc.
+------+--------+--
|  01  |  John  | ...
+------+--------+--
|  02  |  Suzy  | ...
+------+--------+--
|  03  |  Mike  | ...
+------+--------+--
|  04  |  Kim   | ...
+------+--------+--


Table: job (list of all jobs, PK = j_id)
+------+----------+------------+--------+
| j_id | j_person | j_location | j_type |
+------+----------+------------+--------+
|  AB  |    02    |    cityB   | type2  |
+------+----------+------------+--------+
|  CD  |    02    |    cityA   | type3  |
+------+----------+------------+--------+
|  EF  |    01    |    cityC   | type2  |
+------+----------+------------+--------+
|  GH  |    03    |    cityB   | type1  |
+------+----------+------------+--------+
|  IJ  |    04    |    cityA   | type1  |
+------+----------+------------+--------+
|  KL  |    04    |    cityA   | type2  |
+------+----------+------------+--------+


Table: location (list of all locations, PK = l_id)
+-------+----------+--------+
| l_id  |  l_name  | l_area |
+-------+----------+----
| cityA | London   |   ...
+-------+----------+----
| cityB | New York |   ...
+-------+----------+----
| cityC | Brussels |   ...
+-------+----------+----

我需要的:

每个城市的人员列表,以下是此 SQL 语句的结果:

但是...现在来谈谈我的问题

结果无法显示任何重复项 numbers/people。 例如:Suzy (p_id = 02) 在伦敦和纽约都有工作,但为了最终数字正确,她可能只被计算在这 2 个城市中的 1 个。

我想我正在寻找一些解决方案,可以消除任何已经 included/counted 的结果,以便它们无法在下一个城市 another/the 中再次计算。 当对每个城市的人数求和时,结果必须与 table 'person'.

中的记录总数相同

例如,这不是问题。 Suzy 不会被包含在比方说纽约,因为 locations/cities 是更大区域的一部分。而且一个人永远只在1个区域内工作。


我在解释我想要实现的目标时遇到了一些困难,而且我不是英语母语人士,所以如果有什么地方不够清楚,请告诉我。

为此,您首先必须在分组之前将每人的工作数量限制为 1。这是一种方法:

with person as (select 1 p_id, 'John' p_name from dual union all
                select 2 p_id, 'Suzy' p_name from dual union all
                select 3 p_id, 'Mike' p_name from dual union all
                select 4 p_id, 'Kim' p_name from dual),
       jobs as (select 'AB' j_id, 2 j_person, 'cityB' j_location, 'type2' j_type from dual union all
                select 'CD' j_id, 2 j_person, 'cityA' j_location, 'type3' j_type from dual union all
                select 'EF' j_id, 1 j_person, 'cityC' j_location, 'type2' j_type from dual union all
                select 'GH' j_id, 3 j_person, 'cityB' j_location, 'type1' j_type from dual union all
                select 'IJ' j_id, 4 j_person, 'cityA' j_location, 'type1' j_type from dual union all
                select 'KL' j_id, 4 j_person, 'cityA' j_location, 'type2' j_type from dual),
   location as (select 'cityA' l_id, 'London' l_name from dual union all
                select 'cityB' l_id, 'New York' l_name from dual union all
                select 'cityC' l_id, 'Brussels' l_name from dual)
-- end of setting up some subqueries to mimic your tables with data in them. See SQL below:
select   location_name,
         count(distinct person_id) number_of_people
from     (select p.p_id person_id,
                 p.p_name person_name,
                 l.l_name location_name,
                 j.j_type job_type,
                 row_number() over (partition by p.p_id order by j.j_type, l.l_name) rn
          from   jobs j
                 inner join person p on j.j_person = p.p_id
                 inner join location l on j.j_location = l.l_id)
where    rn = 1
group by location_name;

LOCATION_NAME NUMBER_OF_PEOPLE
------------- ----------------
London                       1
Brussels                     1
New York                     2

您可以看到我使用 row_number() 分析函数为每个 p_id 的行分配了一个数字,按照工作类型和位置名称的顺序。如果决定根据 row_number = 1 的行列出哪个位置的逻辑与此不同,您需要适当修改排序子句。

从那里开始,只需过滤结果以仅显示每个 p_id 的第一行,然后对结果进行分组以获得不同的人数。

哦,报告的乐趣 - 让每个城市的数字都不完全正确,让他们在代表我们员工人数的总数中排队?或者让城市正确,然后将它们加起来得出的数字大于我们的工资单?因为实际上,在这种情况下,行项目和总计确实计算不同的东西,因为 "people who do work at this office" 与 "people who do work in the company"

不同

另一种选择——小数人!

如果一个人在两个城市工作,将他们都显示在 "count of people working here" 下,但还要加总一个修饰符以从总数中减去以获得您的雇员总数。

例)

with person as (select 1 p_id, 'John' p_name from dual union all
                select 2 p_id, 'Suzy' p_name from dual union all
                select 3 p_id, 'Mike' p_name from dual union all
                select 4 p_id, 'Kim' p_name from dual),
       jobs as (select 'AB' j_id, 2 j_person, 'cityB' j_location, 'type2' j_type from dual union all
                select 'CD' j_id, 2 j_person, 'cityA' j_location, 'type3' j_type from dual union all
                select 'EF' j_id, 1 j_person, 'cityC' j_location, 'type2' j_type from dual union all
                select 'GH' j_id, 3 j_person, 'cityB' j_location, 'type1' j_type from dual union all
                select 'IJ' j_id, 4 j_person, 'cityA' j_location, 'type1' j_type from dual union all
                select 'KL' j_id, 4 j_person, 'cityA' j_location, 'type2' j_type from dual),
     lctn   as (select 'cityA' l_id, 'London' l_name from dual union all
                select 'cityB' l_id, 'New York' l_name from dual union all
                select 'cityC' l_id, 'Brussels' l_name from dual)
-- end of setting up some subqueries to mimic your tables with data in them. See SQL below:
select   location_name,
         location_jobs             number_of_distinct_jobs,
         count(distinct person_id) cnt_of_people_working_here,
         sum(distinct case when person_jobs = 1 then 0 else (1-person_jobs) end) shared_people
  FROM(  select p.p_id person_id,
                 l.l_name location_name,
                 1/(count(distinct l_name) over (partition by p.p_id)) person_jobs, 
                 count(distinct j_id)   over (partition by l_name) location_jobs 
          from   jobs j
                 inner join person p on j.j_person = p.p_id
                 inner join lctn l on j.j_location = l.l_id)
group by location_name, location_jobs;                 



LOCATION_NAME   NUMBER_OF_DISTINCT_JOBS   CNT_OF_PEOPLE_WORKING_HERE  SHARED_PEOPLE                          
"London"        3                         2                           0.5                                    
"Brussels"      1                         1                           0                                      
"New York"      2                         2                           0.5                                    

当涉及到您的总行时,如果您将 count_of_people_working_here 相加并减去 shared_people 的总和,您将获得总工资。任何其他内容以及您的行或总数都已关闭,因为如前所述,您在不同级别进行分组。