Select 个不同的记录,同时第一次出现来自 MySQL 的记录

Select distinct records and at the same time first occurance of the record from MySQL

我对 MySQL 执行计划不够熟悉,所以我需要帮助来理解和找出如何在可能的情况下对 MySQL 中的数据子集进行操作。我有两个 tables:

Table 用户:

+-----------------+-------------+------+-----+---------+----------------+
| Field           | Type        | Null | Key | Default | Extra          |
+-----------------+-------------+------+-----+---------+----------------+
| user_id         | int(11)     | NO   | PRI | NULL    | auto_increment |
| msisdn          | bigint(20)  | NO   | UNI | NULL    |                |
| activation_date | datetime    | NO   |     | NULL    |                |
| msisdn_type     | varchar(32) | NO   |     | NULL    |                |
+-----------------+-------------+------+-----+---------+----------------+

Table log_archive:

+-------------+--------------+------+-----+---------+-------+
| Field       | Type         | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| msisdn      | bigint(11)   | NO   | MUL | NULL    |       |
| msisdn_type | varchar(32)  | NO   |     | NULL    |       |
| date        | date         | NO   |     | NULL    |       |
| action      | varchar(32)  | NO   |     | NULL    |       |
+-------------+--------------+------+-----+---------+-------+ 

在 table 用户中,msisdn 是唯一的,但在 log_archive 中不是。

在这里您可以找到 PHP 脚本,该脚本将为您生成这两个 table 的测试数据:

Test data generation script helper

我需要 select:

1) All distinct records by msisdn from table log_archive;
2) By earliest date per msisdn for one specific action only;
3) For a specific date range from table log_archive;
4) And to join activation_date from users table with msisdn from both tables.

让我举个例子。假设这是来自 log_archive table:

的示例数据
+--------------+------------+---------------------+----------------+
|    msisdn    |    date    |   activation_date   |     action     |
|--------------+------------+---------------------+----------------+
| 977129764170 | 2016-02-11 | 2014-10-07 00:00:00 | all_services   |
| 977129764170 | 2015-09-05 | 2014-10-07 00:00:00 | app_start      |
| 977129764170 | 2015-05-08 | 2014-10-07 00:00:00 | widget         |
| 986629508626 | 2015-07-12 | 2016-02-05 00:00:00 | app_start      |
| 986629508626 | 2015-03-02 | 2016-02-05 00:00:00 | number_connect |
| 986629508626 | 2015-05-08 | 2016-02-05 00:00:00 | widget         |
| 986629508626 | 2015-01-08 | 2016-02-05 00:00:00 | app_start      |
| 933563888440 | 2016-02-20 | 2014-10-06 00:00:00 | all_services   |
| 933563888440 | 2015-03-12 | 2014-10-06 00:00:00 | app_start      |
| 933563888440 | 2015-04-26 | 2014-10-06 00:00:00 | number_connect |
| 933563888440 | 2015-10-17 | 2014-10-06 00:00:00 | all_services   |
| 943730853721 | 2015-06-19 | 2015-05-01 00:00:00 | widget         |
| 943730853721 | 2015-12-08 | 2015-05-01 00:00:00 | app_start      |
| 943730853721 | 2016-02-09 | 2015-05-01 00:00:00 | app_start      |
+--------------+------------+---------------------+----------------+

这里的不同 msisdns 是 977129764170、986629508626、933563888440、943730853721;

操作列等于 'app_start' 的不同 msisdn 值的最早日期是:

977129764170 is 2015-09-05
986629508626 is 2015-01-08
933563888440 is 2015-03-12
943730853721 is 2015-06-19

我需要做这样的 SQL 会给我这个输出:

+--------------+------------+---------------------+----------------+
|    msisdn    |    date    |   activation_date   |     action     |
|--------------+------------+---------------------+----------------+
| 977129764170 | 2015-09-05 | 2014-10-07 00:00:00 | app_start      |
| 986629508626 | 2015-01-08 | 2016-02-05 00:00:00 | app_start      |
| 933563888440 | 2015-03-12 | 2014-10-06 00:00:00 | app_start      |
| 943730853721 | 2015-12-08 | 2015-05-01 00:00:00 | app_start      |
+--------------+------------+---------------------+----------------+

所以我需要 select 所有不同的 msisdns 以获取 app_start 操作发生的最早日期,并加入 activation_date 来自该不同 msisd 的用户 table。并且只从日期列中查找特定的日期范围。

我用这个 sql 试过了,没有结果:

SELECT DISTINCT(log_archive.msisdn) as msisdn, DATE(log_archive.date) AS actionDate, users.activation_date

FROM log_archive 

INNER JOIN users on log_archive.msisdn = users.msisdn

WHERE log_archive.action = 'app_start' && log_archive.date BETWEEN '2015-01-08' AND '2016-03-15'

ORDER BY actionDate ASC;

即使我使用了 DISTINCT,我也不止一次获得相同的 msisdn。

我需要使用子查询吗?

SDISTINCT 查看所有返回的列,因此查看不同的返回数据行。因此,如果您只想要来自 log_archive 的不同行,请在加入之前在子查询中使用它。 喜欢:

(SELECT DISTINCT * FROM log_archive) AS distinct_Log INNER JOIN...

您需要 GROUP BY 为每个 msisdn 获取 MIN(date)

SELECT msisdn, MIN(date) date, MIN(action) action 
FROM log_archive 
WHERE action='app_start' 
  AND date BETWEEN '2015-01-08' AND '2016-03-15' 
GROUP BY msisdn

我们还添加了一个 MIN(action),因为我们应该聚合每个未分组的字段,并且由于所有选定行的操作相同,MIN 效果很好。

完成后,添加连接就非常简单了;

SELECT a.msisdn, MIN(a.date) date, u.activation_date, MIN(a.action) action 
FROM log_archive a
JOIN users u
  ON u.msisdn = a.msisdn
WHERE a.action='app_start' 
  AND a.date BETWEEN '2015-01-08' AND '2016-03-15' 
GROUP BY a.msisdn