MySQL 右联接数据未正确计数

MySQL Right Join Data Not Counting Correctly

我有这个数据:

+-------------+--------------------+-----------------------+
| employee_id | assignment_started | assignment_terminated |
+-------------+--------------------+-----------------------+
|           1 | 2018-07-01         | (NULL)                |
|           2 | 2018-09-01         | (NULL)                |
|           3 | 2018-10-13         | (NULL)                |
|           4 | 2018-10-13         | (NULL)                |
|           5 | 2018-10-15         | 2019-07-17            |
|           6 | 2018-11-01         | (NULL)                |
|           7 | 2019-01-14         | (NULL)                |
|           8 | 2019-01-24         | (NULL)                |
|           9 | 2019-07-01         | 2019-07-30            |
+-------------+--------------------+-----------------------+

我想按月统计正在分配的员工。要确定该员工是否在分配中,我需要检查我要查找的日期是否在 assigment_started 和分配终止之间。但是如果 assignmen_termiated 为 null 我将其设置为 NOW().

另外,我有一个日期范围需要检查。因此,如果我的日期范围是从 2018-01-01 到 2019-07-30,我需要按每个月计算员工数量,如果在某些月份没有员工在分配,我应该将 0 值作为计数。

要创建 DATE RANGE MONTHS 我使用此代码:

select DISTINCT CONCAT(YEAR(gen_date),' ',MONTHNAME(gen_date)) AS month_name FROM 
(select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) gen_date FROM 
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0, 
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1, 
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2, 
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3, 
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v 
WHERE gen_date between '2018-01-01 00:00:00' and '2019-08-31 23:59:59'

我从中得到的是:

+-------------+
| month_name  | 
+-------------+
|2018 January |
|2018 February| 
|2018 March   | 
|2018 April   | 
|         ... | 
|         ... |
|         ... |
|2019 August  | 
+-------------+

从上面的数据中您会看到,到 2018 年 7 月,我的员工人数为 0,而 2018 年 7 月,我的员工人数为 1,例如,在 2018 年 9 月,我的员工人数应该为 5,因为有 5 名员工在那个月工作。

为了缩短问题,我使用这段代码来实现我需要的,但由于某种原因计数结果不正确...我正在尝试解决这个问题,但不知道为什么我得到的结果是你可以在下面的 fiddle 中找到。

SELECT calendar.month_name, COUNT(employee_id) AS emp_count
FROM job_order_employees
RIGHT JOIN (select DISTINCT CONCAT(YEAR(gen_date),' ',MONTHNAME(gen_date)) AS month_name FROM 
(select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) gen_date FROM 
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0, 
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1, 
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2, 
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3, 
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v 
WHERE gen_date between '2018-01-01 00:00:00' and '2019-08-31 23:59:59') as calendar
ON STR_TO_DATE(CONCAT(calendar.month_name,'01'),'%Y %M %d') BETWEEN job_order_employees.assignment_started AND IFNULL(job_order_employees.assignment_terminated,NOW())
GROUP BY calendar.month_name
ORDER BY STR_TO_DATE(calendar.month_name,'%Y %M') 

这是一些示例数据:

-- Dumping structure for table d-works-test.job_order_employees
CREATE TABLE IF NOT EXISTS `job_order_employees` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `employee_id` int(10) unsigned NOT NULL,
  `assignment_started` date NOT NULL,
  `assignment_terminated` date DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

-- Dumping data for table d-works-test.job_order_employees: ~14 rows (approximately)
/*!40000 ALTER TABLE `job_order_employees` DISABLE KEYS */;
INSERT INTO `job_order_employees` 
(`id`
, `employee_id`
,`assignment_started`
, `assignment_terminated`) VALUES
(1, 1,'2019-05-29', NULL),
(2, 2,'2018-09-19', NULL),
(3, 3,'2018-07-01', NULL),
(4, 4, '2018-10-13', NULL),
(5, 5, '2018-10-13', NULL),
(6, 6, '2019-02-01', NULL),
(7, 7, '2019-01-14', NULL),
(8, 8, '2018-11-01', NULL),
(9, 8, '2019-01-01', NULL),
(10, 9, '2019-02-01', NULL),
(11, 9, '2019-01-24', NULL),
(12, 9, '2018-12-31', NULL),
(13, 10, '2018-10-13', '2019-07-17'),
(14, 10, '2019-07-01', '2019-07-17');

与数据库Fiddle相同:https://www.db-fiddle.com/f/8dUFx1DWiyypbkx9s2cYyG/1

提前感谢您的帮助!

您可以通过将格式化月份字符串推迟到最后一步来大大简化您的逻辑;您正在为仅与最终格式相关的内容进行大量转换工作。

这也会有所帮助,因为这样您就可以定义每个月的开始时间和结束时间,如下所示:

SELECT 添加日期('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) gen_date 来自(东西)v

然后,像这样使用它:

SELECT [format rangestart here], COUNT(employee_id) AS emp_count
FROM (
   SELECT DISTINCT gen_date AS rangestart, gen_date + INTERVAL 1 MONTH AS rangeend 
   FROM v
   WHERE gen_date BETWEEN '2018-01-01 00:00:00' AND'2019-08-31 23:59:59'
) as calendar
LEFT JOIN job_order_employees AS joe
   ON IFNULL(joe.assignment_terminated,NOW()) >= calendar.rangestart
   AND joe.assignment_started <= calendar.rangeend
GROUP BY calendar.rangestart
ORDER BY calendar.rangestart 
;

连接逻辑(重叠检查条件)看起来有点奇怪,直到您意识到它的来源。这是 "not ones that don't overlap".

的简化

NOT (ended < range_start || started > range_end) 简化为 ended >= range_start && started <= range_end


编辑: 以上错误地假设子查询每个月都生成;以下应该有效

日历查询(这会涵盖大约83年,你可以再加一个乘数1000的t#table得到833年价值):

SELECT '1970-01-01' + INTERVAL t0 + t1 * 10 + t2 * 100 MONTH AS start_date
    , '1970-01-01' + INTERVAL 1 + t0 + t1 * 10 + t2 * 100 MONTH AS end_date  
FROM (SELECT 0 t0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0
    , (SELECT 0 t1 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1
    , (SELECT 0 t2 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2

最终查询:

SELECT [format calendar.start_date here]
   , COUNT(employee_id) AS emp_count
FROM ( 
   *calendar query above goes here* 
) as calendar
LEFT JOIN job_order_employees AS joe
   ON IFNULL(joe.assignment_terminated,NOW()) >= calendar.start_date
   AND joe.assignment_started < calendar.end_date
WHERE calendar.start_date BETWEEN '2018-01-01 00:00:00' AND '2019-08-31 23:59:59'
GROUP BY calendar.start_date
ORDER BY calendar.start_date
;

注意:我还更改了重叠比较的运算符;因为生成的end_date是non-inclusive,所以应该是 NOT (ended < range_start || started >= range_end) 简化为 ended >= range_start && started < range_end

我建议使用 coalesce 来输入当前日期。然后我将创建需要计算的月份列表,并将其加入按员工和月份分组的任务列表中。