Select 个不同的记录,同时第一次出现来自 MySQL 的记录
Select distinct records and at the same time first occurance of the record from MySQL
我对 MySQL 执行计划不够熟悉,所以我需要帮助来理解和找出如何在可能的情况下对 MySQL 中的数据子集进行操作。我有两个 tables:
Table 用户:
+-----------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+----------------+
| user_id | int(11) | NO | PRI | NULL | auto_increment |
| msisdn | bigint(20) | NO | UNI | NULL | |
| activation_date | datetime | NO | | NULL | |
| msisdn_type | varchar(32) | NO | | NULL | |
+-----------------+-------------+------+-----+---------+----------------+
Table log_archive:
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| msisdn | bigint(11) | NO | MUL | NULL | |
| msisdn_type | varchar(32) | NO | | NULL | |
| date | date | NO | | NULL | |
| action | varchar(32) | NO | | NULL | |
+-------------+--------------+------+-----+---------+-------+
在 table 用户中,msisdn 是唯一的,但在 log_archive 中不是。
在这里您可以找到 PHP 脚本,该脚本将为您生成这两个 table 的测试数据:
Test data generation script helper
我需要 select:
1) All distinct records by msisdn from table log_archive;
2) By earliest date per msisdn for one specific action only;
3) For a specific date range from table log_archive;
4) And to join activation_date from users table with msisdn from both tables.
让我举个例子。假设这是来自 log_archive table:
的示例数据
+--------------+------------+---------------------+----------------+
| msisdn | date | activation_date | action |
|--------------+------------+---------------------+----------------+
| 977129764170 | 2016-02-11 | 2014-10-07 00:00:00 | all_services |
| 977129764170 | 2015-09-05 | 2014-10-07 00:00:00 | app_start |
| 977129764170 | 2015-05-08 | 2014-10-07 00:00:00 | widget |
| 986629508626 | 2015-07-12 | 2016-02-05 00:00:00 | app_start |
| 986629508626 | 2015-03-02 | 2016-02-05 00:00:00 | number_connect |
| 986629508626 | 2015-05-08 | 2016-02-05 00:00:00 | widget |
| 986629508626 | 2015-01-08 | 2016-02-05 00:00:00 | app_start |
| 933563888440 | 2016-02-20 | 2014-10-06 00:00:00 | all_services |
| 933563888440 | 2015-03-12 | 2014-10-06 00:00:00 | app_start |
| 933563888440 | 2015-04-26 | 2014-10-06 00:00:00 | number_connect |
| 933563888440 | 2015-10-17 | 2014-10-06 00:00:00 | all_services |
| 943730853721 | 2015-06-19 | 2015-05-01 00:00:00 | widget |
| 943730853721 | 2015-12-08 | 2015-05-01 00:00:00 | app_start |
| 943730853721 | 2016-02-09 | 2015-05-01 00:00:00 | app_start |
+--------------+------------+---------------------+----------------+
这里的不同 msisdns 是 977129764170、986629508626、933563888440、943730853721;
操作列等于 'app_start' 的不同 msisdn 值的最早日期是:
977129764170 is 2015-09-05
986629508626 is 2015-01-08
933563888440 is 2015-03-12
943730853721 is 2015-06-19
我需要做这样的 SQL 会给我这个输出:
+--------------+------------+---------------------+----------------+
| msisdn | date | activation_date | action |
|--------------+------------+---------------------+----------------+
| 977129764170 | 2015-09-05 | 2014-10-07 00:00:00 | app_start |
| 986629508626 | 2015-01-08 | 2016-02-05 00:00:00 | app_start |
| 933563888440 | 2015-03-12 | 2014-10-06 00:00:00 | app_start |
| 943730853721 | 2015-12-08 | 2015-05-01 00:00:00 | app_start |
+--------------+------------+---------------------+----------------+
所以我需要 select 所有不同的 msisdns 以获取 app_start 操作发生的最早日期,并加入 activation_date 来自该不同 msisd 的用户 table。并且只从日期列中查找特定的日期范围。
我用这个 sql 试过了,没有结果:
SELECT DISTINCT(log_archive.msisdn) as msisdn, DATE(log_archive.date) AS actionDate, users.activation_date
FROM log_archive
INNER JOIN users on log_archive.msisdn = users.msisdn
WHERE log_archive.action = 'app_start' && log_archive.date BETWEEN '2015-01-08' AND '2016-03-15'
ORDER BY actionDate ASC;
即使我使用了 DISTINCT,我也不止一次获得相同的 msisdn。
我需要使用子查询吗?
SDISTINCT 查看所有返回的列,因此查看不同的返回数据行。因此,如果您只想要来自 log_archive 的不同行,请在加入之前在子查询中使用它。
喜欢:
(SELECT DISTINCT * FROM log_archive) AS distinct_Log INNER JOIN...
您需要 GROUP BY
为每个 msisdn 获取 MIN(date)
;
SELECT msisdn, MIN(date) date, MIN(action) action
FROM log_archive
WHERE action='app_start'
AND date BETWEEN '2015-01-08' AND '2016-03-15'
GROUP BY msisdn
我们还添加了一个 MIN(action)
,因为我们应该聚合每个未分组的字段,并且由于所有选定行的操作相同,MIN
效果很好。
完成后,添加连接就非常简单了;
SELECT a.msisdn, MIN(a.date) date, u.activation_date, MIN(a.action) action
FROM log_archive a
JOIN users u
ON u.msisdn = a.msisdn
WHERE a.action='app_start'
AND a.date BETWEEN '2015-01-08' AND '2016-03-15'
GROUP BY a.msisdn
我对 MySQL 执行计划不够熟悉,所以我需要帮助来理解和找出如何在可能的情况下对 MySQL 中的数据子集进行操作。我有两个 tables:
Table 用户:
+-----------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+----------------+
| user_id | int(11) | NO | PRI | NULL | auto_increment |
| msisdn | bigint(20) | NO | UNI | NULL | |
| activation_date | datetime | NO | | NULL | |
| msisdn_type | varchar(32) | NO | | NULL | |
+-----------------+-------------+------+-----+---------+----------------+
Table log_archive:
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| msisdn | bigint(11) | NO | MUL | NULL | |
| msisdn_type | varchar(32) | NO | | NULL | |
| date | date | NO | | NULL | |
| action | varchar(32) | NO | | NULL | |
+-------------+--------------+------+-----+---------+-------+
在 table 用户中,msisdn 是唯一的,但在 log_archive 中不是。
在这里您可以找到 PHP 脚本,该脚本将为您生成这两个 table 的测试数据:
Test data generation script helper
我需要 select:
1) All distinct records by msisdn from table log_archive;
2) By earliest date per msisdn for one specific action only;
3) For a specific date range from table log_archive;
4) And to join activation_date from users table with msisdn from both tables.
让我举个例子。假设这是来自 log_archive table:
的示例数据+--------------+------------+---------------------+----------------+
| msisdn | date | activation_date | action |
|--------------+------------+---------------------+----------------+
| 977129764170 | 2016-02-11 | 2014-10-07 00:00:00 | all_services |
| 977129764170 | 2015-09-05 | 2014-10-07 00:00:00 | app_start |
| 977129764170 | 2015-05-08 | 2014-10-07 00:00:00 | widget |
| 986629508626 | 2015-07-12 | 2016-02-05 00:00:00 | app_start |
| 986629508626 | 2015-03-02 | 2016-02-05 00:00:00 | number_connect |
| 986629508626 | 2015-05-08 | 2016-02-05 00:00:00 | widget |
| 986629508626 | 2015-01-08 | 2016-02-05 00:00:00 | app_start |
| 933563888440 | 2016-02-20 | 2014-10-06 00:00:00 | all_services |
| 933563888440 | 2015-03-12 | 2014-10-06 00:00:00 | app_start |
| 933563888440 | 2015-04-26 | 2014-10-06 00:00:00 | number_connect |
| 933563888440 | 2015-10-17 | 2014-10-06 00:00:00 | all_services |
| 943730853721 | 2015-06-19 | 2015-05-01 00:00:00 | widget |
| 943730853721 | 2015-12-08 | 2015-05-01 00:00:00 | app_start |
| 943730853721 | 2016-02-09 | 2015-05-01 00:00:00 | app_start |
+--------------+------------+---------------------+----------------+
这里的不同 msisdns 是 977129764170、986629508626、933563888440、943730853721;
操作列等于 'app_start' 的不同 msisdn 值的最早日期是:
977129764170 is 2015-09-05
986629508626 is 2015-01-08
933563888440 is 2015-03-12
943730853721 is 2015-06-19
我需要做这样的 SQL 会给我这个输出:
+--------------+------------+---------------------+----------------+
| msisdn | date | activation_date | action |
|--------------+------------+---------------------+----------------+
| 977129764170 | 2015-09-05 | 2014-10-07 00:00:00 | app_start |
| 986629508626 | 2015-01-08 | 2016-02-05 00:00:00 | app_start |
| 933563888440 | 2015-03-12 | 2014-10-06 00:00:00 | app_start |
| 943730853721 | 2015-12-08 | 2015-05-01 00:00:00 | app_start |
+--------------+------------+---------------------+----------------+
所以我需要 select 所有不同的 msisdns 以获取 app_start 操作发生的最早日期,并加入 activation_date 来自该不同 msisd 的用户 table。并且只从日期列中查找特定的日期范围。
我用这个 sql 试过了,没有结果:
SELECT DISTINCT(log_archive.msisdn) as msisdn, DATE(log_archive.date) AS actionDate, users.activation_date
FROM log_archive
INNER JOIN users on log_archive.msisdn = users.msisdn
WHERE log_archive.action = 'app_start' && log_archive.date BETWEEN '2015-01-08' AND '2016-03-15'
ORDER BY actionDate ASC;
即使我使用了 DISTINCT,我也不止一次获得相同的 msisdn。
我需要使用子查询吗?
SDISTINCT 查看所有返回的列,因此查看不同的返回数据行。因此,如果您只想要来自 log_archive 的不同行,请在加入之前在子查询中使用它。 喜欢:
(SELECT DISTINCT * FROM log_archive) AS distinct_Log INNER JOIN...
您需要 GROUP BY
为每个 msisdn 获取 MIN(date)
;
SELECT msisdn, MIN(date) date, MIN(action) action
FROM log_archive
WHERE action='app_start'
AND date BETWEEN '2015-01-08' AND '2016-03-15'
GROUP BY msisdn
我们还添加了一个 MIN(action)
,因为我们应该聚合每个未分组的字段,并且由于所有选定行的操作相同,MIN
效果很好。
完成后,添加连接就非常简单了;
SELECT a.msisdn, MIN(a.date) date, u.activation_date, MIN(a.action) action
FROM log_archive a
JOIN users u
ON u.msisdn = a.msisdn
WHERE a.action='app_start'
AND a.date BETWEEN '2015-01-08' AND '2016-03-15'
GROUP BY a.msisdn