计数(不同)来自连接表 returns duplicate/incorrect 个值
count(distinct) from joined tables returns duplicate/incorrect values
SQL:
SELECT COUNT(DISTINCT person.p_id) AS numberOfPeople,
location.l_id AS location
FROM job
INNER JOIN person ON job.j_person = person.p_id
INNER JOIN (location INNER JOIN area ON location.l_area = area.a_id) ON job.j_location = location.l_id
GROUP BY area.a_name, location.l_name
数据库: 'job' table 与 'person' (在 j_person = p_id) 和 'location' (在 j_location = l_id)
Table: person (list of all people in the company, PK = p_id)
+------+--------+--
| p_id | p_name | etc.
+------+--------+--
| 01 | John | ...
+------+--------+--
| 02 | Suzy | ...
+------+--------+--
| 03 | Mike | ...
+------+--------+--
| 04 | Kim | ...
+------+--------+--
Table: job (list of all jobs, PK = j_id)
+------+----------+------------+--------+
| j_id | j_person | j_location | j_type |
+------+----------+------------+--------+
| AB | 02 | cityB | type2 |
+------+----------+------------+--------+
| CD | 02 | cityA | type3 |
+------+----------+------------+--------+
| EF | 01 | cityC | type2 |
+------+----------+------------+--------+
| GH | 03 | cityB | type1 |
+------+----------+------------+--------+
| IJ | 04 | cityA | type1 |
+------+----------+------------+--------+
| KL | 04 | cityA | type2 |
+------+----------+------------+--------+
Table: location (list of all locations, PK = l_id)
+-------+----------+--------+
| l_id | l_name | l_area |
+-------+----------+----
| cityA | London | ...
+-------+----------+----
| cityB | New York | ...
+-------+----------+----
| cityC | Brussels | ...
+-------+----------+----
我需要的:
每个城市的人员列表,以下是此 SQL 语句的结果:
- 区域1:
- 伦敦:2
- 纽约:2
- 2区:
- 布鲁塞尔:1
但是...现在来谈谈我的问题
结果无法显示任何重复项 numbers/people。
例如:Suzy (p_id = 02) 在伦敦和纽约都有工作,但为了最终数字正确,她可能只被计算在这 2 个城市中的 1 个。
我想我正在寻找一些解决方案,可以消除任何已经 included/counted 的结果,以便它们无法在下一个城市 another/the 中再次计算。
当对每个城市的人数求和时,结果必须与 table 'person'.
中的记录总数相同
例如,这不是问题。 Suzy 不会被包含在比方说纽约,因为 locations/cities 是更大区域的一部分。而且一个人永远只在1个区域内工作。
我在解释我想要实现的目标时遇到了一些困难,而且我不是英语母语人士,所以如果有什么地方不够清楚,请告诉我。
为此,您首先必须在分组之前将每人的工作数量限制为 1。这是一种方法:
with person as (select 1 p_id, 'John' p_name from dual union all
select 2 p_id, 'Suzy' p_name from dual union all
select 3 p_id, 'Mike' p_name from dual union all
select 4 p_id, 'Kim' p_name from dual),
jobs as (select 'AB' j_id, 2 j_person, 'cityB' j_location, 'type2' j_type from dual union all
select 'CD' j_id, 2 j_person, 'cityA' j_location, 'type3' j_type from dual union all
select 'EF' j_id, 1 j_person, 'cityC' j_location, 'type2' j_type from dual union all
select 'GH' j_id, 3 j_person, 'cityB' j_location, 'type1' j_type from dual union all
select 'IJ' j_id, 4 j_person, 'cityA' j_location, 'type1' j_type from dual union all
select 'KL' j_id, 4 j_person, 'cityA' j_location, 'type2' j_type from dual),
location as (select 'cityA' l_id, 'London' l_name from dual union all
select 'cityB' l_id, 'New York' l_name from dual union all
select 'cityC' l_id, 'Brussels' l_name from dual)
-- end of setting up some subqueries to mimic your tables with data in them. See SQL below:
select location_name,
count(distinct person_id) number_of_people
from (select p.p_id person_id,
p.p_name person_name,
l.l_name location_name,
j.j_type job_type,
row_number() over (partition by p.p_id order by j.j_type, l.l_name) rn
from jobs j
inner join person p on j.j_person = p.p_id
inner join location l on j.j_location = l.l_id)
where rn = 1
group by location_name;
LOCATION_NAME NUMBER_OF_PEOPLE
------------- ----------------
London 1
Brussels 1
New York 2
您可以看到我使用 row_number()
分析函数为每个 p_id 的行分配了一个数字,按照工作类型和位置名称的顺序。如果决定根据 row_number = 1 的行列出哪个位置的逻辑与此不同,您需要适当修改排序子句。
从那里开始,只需过滤结果以仅显示每个 p_id 的第一行,然后对结果进行分组以获得不同的人数。
哦,报告的乐趣 - 让每个城市的数字都不完全正确,让他们在代表我们员工人数的总数中排队?或者让城市正确,然后将它们加起来得出的数字大于我们的工资单?因为实际上,在这种情况下,行项目和总计确实计算不同的东西,因为 "people who do work at this office" 与 "people who do work in the company"
不同
另一种选择——小数人!
如果一个人在两个城市工作,将他们都显示在 "count of people working here" 下,但还要加总一个修饰符以从总数中减去以获得您的雇员总数。
例)
with person as (select 1 p_id, 'John' p_name from dual union all
select 2 p_id, 'Suzy' p_name from dual union all
select 3 p_id, 'Mike' p_name from dual union all
select 4 p_id, 'Kim' p_name from dual),
jobs as (select 'AB' j_id, 2 j_person, 'cityB' j_location, 'type2' j_type from dual union all
select 'CD' j_id, 2 j_person, 'cityA' j_location, 'type3' j_type from dual union all
select 'EF' j_id, 1 j_person, 'cityC' j_location, 'type2' j_type from dual union all
select 'GH' j_id, 3 j_person, 'cityB' j_location, 'type1' j_type from dual union all
select 'IJ' j_id, 4 j_person, 'cityA' j_location, 'type1' j_type from dual union all
select 'KL' j_id, 4 j_person, 'cityA' j_location, 'type2' j_type from dual),
lctn as (select 'cityA' l_id, 'London' l_name from dual union all
select 'cityB' l_id, 'New York' l_name from dual union all
select 'cityC' l_id, 'Brussels' l_name from dual)
-- end of setting up some subqueries to mimic your tables with data in them. See SQL below:
select location_name,
location_jobs number_of_distinct_jobs,
count(distinct person_id) cnt_of_people_working_here,
sum(distinct case when person_jobs = 1 then 0 else (1-person_jobs) end) shared_people
FROM( select p.p_id person_id,
l.l_name location_name,
1/(count(distinct l_name) over (partition by p.p_id)) person_jobs,
count(distinct j_id) over (partition by l_name) location_jobs
from jobs j
inner join person p on j.j_person = p.p_id
inner join lctn l on j.j_location = l.l_id)
group by location_name, location_jobs;
LOCATION_NAME NUMBER_OF_DISTINCT_JOBS CNT_OF_PEOPLE_WORKING_HERE SHARED_PEOPLE
"London" 3 2 0.5
"Brussels" 1 1 0
"New York" 2 2 0.5
当涉及到您的总行时,如果您将 count_of_people_working_here 相加并减去 shared_people 的总和,您将获得总工资。任何其他内容以及您的行或总数都已关闭,因为如前所述,您在不同级别进行分组。
SQL:
SELECT COUNT(DISTINCT person.p_id) AS numberOfPeople,
location.l_id AS location
FROM job
INNER JOIN person ON job.j_person = person.p_id
INNER JOIN (location INNER JOIN area ON location.l_area = area.a_id) ON job.j_location = location.l_id
GROUP BY area.a_name, location.l_name
数据库: 'job' table 与 'person' (在 j_person = p_id) 和 'location' (在 j_location = l_id)
Table: person (list of all people in the company, PK = p_id)
+------+--------+--
| p_id | p_name | etc.
+------+--------+--
| 01 | John | ...
+------+--------+--
| 02 | Suzy | ...
+------+--------+--
| 03 | Mike | ...
+------+--------+--
| 04 | Kim | ...
+------+--------+--
Table: job (list of all jobs, PK = j_id)
+------+----------+------------+--------+
| j_id | j_person | j_location | j_type |
+------+----------+------------+--------+
| AB | 02 | cityB | type2 |
+------+----------+------------+--------+
| CD | 02 | cityA | type3 |
+------+----------+------------+--------+
| EF | 01 | cityC | type2 |
+------+----------+------------+--------+
| GH | 03 | cityB | type1 |
+------+----------+------------+--------+
| IJ | 04 | cityA | type1 |
+------+----------+------------+--------+
| KL | 04 | cityA | type2 |
+------+----------+------------+--------+
Table: location (list of all locations, PK = l_id)
+-------+----------+--------+
| l_id | l_name | l_area |
+-------+----------+----
| cityA | London | ...
+-------+----------+----
| cityB | New York | ...
+-------+----------+----
| cityC | Brussels | ...
+-------+----------+----
我需要的:
每个城市的人员列表,以下是此 SQL 语句的结果:
- 区域1:
- 伦敦:2
- 纽约:2
- 2区:
- 布鲁塞尔:1
但是...现在来谈谈我的问题
结果无法显示任何重复项 numbers/people。 例如:Suzy (p_id = 02) 在伦敦和纽约都有工作,但为了最终数字正确,她可能只被计算在这 2 个城市中的 1 个。
我想我正在寻找一些解决方案,可以消除任何已经 included/counted 的结果,以便它们无法在下一个城市 another/the 中再次计算。 当对每个城市的人数求和时,结果必须与 table 'person'.
中的记录总数相同例如,这不是问题。 Suzy 不会被包含在比方说纽约,因为 locations/cities 是更大区域的一部分。而且一个人永远只在1个区域内工作。
我在解释我想要实现的目标时遇到了一些困难,而且我不是英语母语人士,所以如果有什么地方不够清楚,请告诉我。
为此,您首先必须在分组之前将每人的工作数量限制为 1。这是一种方法:
with person as (select 1 p_id, 'John' p_name from dual union all
select 2 p_id, 'Suzy' p_name from dual union all
select 3 p_id, 'Mike' p_name from dual union all
select 4 p_id, 'Kim' p_name from dual),
jobs as (select 'AB' j_id, 2 j_person, 'cityB' j_location, 'type2' j_type from dual union all
select 'CD' j_id, 2 j_person, 'cityA' j_location, 'type3' j_type from dual union all
select 'EF' j_id, 1 j_person, 'cityC' j_location, 'type2' j_type from dual union all
select 'GH' j_id, 3 j_person, 'cityB' j_location, 'type1' j_type from dual union all
select 'IJ' j_id, 4 j_person, 'cityA' j_location, 'type1' j_type from dual union all
select 'KL' j_id, 4 j_person, 'cityA' j_location, 'type2' j_type from dual),
location as (select 'cityA' l_id, 'London' l_name from dual union all
select 'cityB' l_id, 'New York' l_name from dual union all
select 'cityC' l_id, 'Brussels' l_name from dual)
-- end of setting up some subqueries to mimic your tables with data in them. See SQL below:
select location_name,
count(distinct person_id) number_of_people
from (select p.p_id person_id,
p.p_name person_name,
l.l_name location_name,
j.j_type job_type,
row_number() over (partition by p.p_id order by j.j_type, l.l_name) rn
from jobs j
inner join person p on j.j_person = p.p_id
inner join location l on j.j_location = l.l_id)
where rn = 1
group by location_name;
LOCATION_NAME NUMBER_OF_PEOPLE
------------- ----------------
London 1
Brussels 1
New York 2
您可以看到我使用 row_number()
分析函数为每个 p_id 的行分配了一个数字,按照工作类型和位置名称的顺序。如果决定根据 row_number = 1 的行列出哪个位置的逻辑与此不同,您需要适当修改排序子句。
从那里开始,只需过滤结果以仅显示每个 p_id 的第一行,然后对结果进行分组以获得不同的人数。
哦,报告的乐趣 - 让每个城市的数字都不完全正确,让他们在代表我们员工人数的总数中排队?或者让城市正确,然后将它们加起来得出的数字大于我们的工资单?因为实际上,在这种情况下,行项目和总计确实计算不同的东西,因为 "people who do work at this office" 与 "people who do work in the company"
不同另一种选择——小数人!
如果一个人在两个城市工作,将他们都显示在 "count of people working here" 下,但还要加总一个修饰符以从总数中减去以获得您的雇员总数。
例)
with person as (select 1 p_id, 'John' p_name from dual union all
select 2 p_id, 'Suzy' p_name from dual union all
select 3 p_id, 'Mike' p_name from dual union all
select 4 p_id, 'Kim' p_name from dual),
jobs as (select 'AB' j_id, 2 j_person, 'cityB' j_location, 'type2' j_type from dual union all
select 'CD' j_id, 2 j_person, 'cityA' j_location, 'type3' j_type from dual union all
select 'EF' j_id, 1 j_person, 'cityC' j_location, 'type2' j_type from dual union all
select 'GH' j_id, 3 j_person, 'cityB' j_location, 'type1' j_type from dual union all
select 'IJ' j_id, 4 j_person, 'cityA' j_location, 'type1' j_type from dual union all
select 'KL' j_id, 4 j_person, 'cityA' j_location, 'type2' j_type from dual),
lctn as (select 'cityA' l_id, 'London' l_name from dual union all
select 'cityB' l_id, 'New York' l_name from dual union all
select 'cityC' l_id, 'Brussels' l_name from dual)
-- end of setting up some subqueries to mimic your tables with data in them. See SQL below:
select location_name,
location_jobs number_of_distinct_jobs,
count(distinct person_id) cnt_of_people_working_here,
sum(distinct case when person_jobs = 1 then 0 else (1-person_jobs) end) shared_people
FROM( select p.p_id person_id,
l.l_name location_name,
1/(count(distinct l_name) over (partition by p.p_id)) person_jobs,
count(distinct j_id) over (partition by l_name) location_jobs
from jobs j
inner join person p on j.j_person = p.p_id
inner join lctn l on j.j_location = l.l_id)
group by location_name, location_jobs;
LOCATION_NAME NUMBER_OF_DISTINCT_JOBS CNT_OF_PEOPLE_WORKING_HERE SHARED_PEOPLE
"London" 3 2 0.5
"Brussels" 1 1 0
"New York" 2 2 0.5
当涉及到您的总行时,如果您将 count_of_people_working_here 相加并减去 shared_people 的总和,您将获得总工资。任何其他内容以及您的行或总数都已关闭,因为如前所述,您在不同级别进行分组。