使用 GROUP BY 和 JOIN-ing 非聚合列查询

Using GROUP BY and JOIN-ing non-aggregated columns to query

我知道有很多类似的问题,但是我没有通过阅读来解决我的问题。我会很感激一些指示。

下面是我的 dummy table 中的一些示例数据:

id foo bar baz moo ins_date percentage yes no maybe
38 foothing bar_one pizazz amoosing 2018-05-26 06:59:00 81 25 529 196
41 foothing bar_one pizazz amoosing 2018-05-29 06:43:00 83 441 144 49
23 foothing bar_one pizazz amoosing 2018-06-24 08:48:00 62 9 1 16
20 foothing bar_one pizazz amoosing 2018-06-27 10:37:00 94 676 16 400
65 foothing bar_one pizazz amoosing 2018-07-01 08:34:00 92 121 64 225
68 foothing bar_one pizazz amoosing 2018-07-04 01:46:00 91 324 25 289
71 foothing bar_one pizazz amoosing 2018-07-06 23:44:00 65 196 676 100
74 foothing bar_one pizazz amoosing 2018-07-10 09:41:00 92 1024 121 81
77 foothing bar_one pizazz amoosing 2018-07-13 06:47:00 64 576 169 1
96 foothing bar_one pizazz amoosing 2018-08-02 10:34:00 78 1369 256 81
99 foothing bar_one pizazz amoosing 2018-08-04 08:25:00 82 2809 9 256
102 foothing bar_one pizazz amoosing 2018-08-07 06:49:00 87 576 9 676
105 foothing bar_one pizazz amoosing 2018-08-10 03:29:00 68 4225 1089 196
108 foothing bar_one pizazz amoosing 2018-08-13 03:59:00 92 1156 169 484
111 foothing bar_one pizazz amoosing 2018-08-16 05:34:00 63 1764 100 108

我想通过单次查询实现:

  1. 过滤所有行以获得 ins_date 我选择的日期范围
  2. 获取每组 foo、bar、baz 和 moo 的 最大值 ins_date
  3. 能够按 foo、bar、baz 和 moo 筛选行
  4. 另外在查询中显示未分组的值,例如百分比、是、否和可能。

总的来说,这被证明是复杂的。到目前为止,我已经设法实现了下面查询中的前 3 点,希望这能解释我正在寻找的内容:

SELECT
  s.foo,
  s.bar, 
  s.baz,
  s.moo,
  MAX(s.ins_date) mdate
FROM 
(
  SELECT *
  FROM dummy
  WHERE ins_date
  -- My arbitrary date range goes here
  BETWEEN '2018-07-01 00:00:00'
  AND '2019-11-01 23:59:59'
) s

GROUP BY foo, bar, baz, moo
-- I could add other filters into the 'HAVING' clause
HAVING moo LIKE "%moo%"
AND baz = "baz"

这给出了输出:

foo bar baz moo mdate
foothing bar_one baz amoosing 2018-11-29 05:31:00
foothing bar_one baz mooman_being 2019-04-21 10:31:00
foothing bar_one baz strawberry_moosse 2019-03-17 06:37:00

在此示例中,如果我要更改日期限制以仅显示 2018-05-012018-05-29 之间的日期,那么第一行 mdate 将显示 2018-05-29 06:43:00,因为对于 foo/bar/baz/moo.

的特定分组,这是该日期范围内的最新(最近)日期

但我无法附加与该分组无关的其他列。我试过使用 JOIN ...

SELECT
  s1.foo,
  s1.bar, 
  s1.baz,
  s1.moo,
  MAX(s1.ins_date) mdate,
  s2.percentage,
  s2.yes,
  s2.maybe,
  s2.no
FROM 
(
  SELECT *
  FROM dummy
  WHERE ins_date
  -- My arbitrary date range goes here
  BETWEEN '2018-07-01 00:00:00'
  AND '2019-11-01 23:59:59'
) s1
-- Attempting to a do a self-join to get the non-aggregated columns
INNER JOIN
(
  SELECT id, percentage, yes, maybe, no
  FROM dummy
) s2
ON s2.id = s1.id

GROUP BY foo, bar, baz, moo
-- I could add other filters into the 'HAVING' clause
HAVING moo LIKE "%moo%"
AND baz = "baz"

但是这个returns错误:

Expression #6 of SELECT list is not in GROUP BY clause and contains nonaggregated column 's2.percentage' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

如何在不破坏查询的情况下将非聚合列添加到查询中?

我正在使用 mysql 5.7,所以 mysql 8 个选项不可用。

Fiddle 下面:

https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=1980dd582c2235dc0938cb14c781e3c6

也许通过聚合额外的列?

SELECT foo, bar, baz, moo
, MAX(ins_date) AS mdate
, AVG(percentage) AS avg_perc
, MAX(yes) AS YEAHBABY
, MAX(maybe) AS MAYBEBABY
, MAX(no) AS NONONONONOOO
FROM dummy dum
WHERE ins_date BETWEEN '2018-07-01 00:00:00'
                   AND '2019-11-01 23:59:59'
  AND moo LIKE '%moo%'
  AND baz = 'baz' 
GROUP BY foo, bar, baz, moo
foo      | bar     | baz | moo               | mdate               | avg_perc | YEAHBABY | MAYBEBABY | NONONONONOOO
:------- | :------ | :-- | :---------------- | :------------------ | -------: | -------: | --------: | -----------:
foothing | bar_one | baz | amoosing          | 2018-11-29 05:31:00 |  82.8000 |    11236 |       625 |          841
foothing | bar_one | baz | mooman_being      | 2019-04-21 10:31:00 |  70.0000 |     3969 |        16 |          121
foothing | bar_one | baz | strawberry_moosse | 2019-03-17 06:37:00 |  80.0000 |    23716 |       529 |           49

db<>fiddle here

或加入分组字段和最大日期。

或使用模拟 row_number。

或使用 EXISTS.

SELECT foo, bar, baz, moo
, ins_date
, percentage
, yes, maybe, no
FROM dummy dum
WHERE EXISTS (
  SELECT 1
  FROM dummy dum2
  WHERE dum2.ins_date BETWEEN '2018-07-01 00:00:00'
                          AND '2019-11-01 23:59:59'
    AND dum2.moo LIKE '%moo%'
    AND dum2.baz = 'baz'
    AND dum2.foo = dum.foo
    AND dum2.bar = dum.bar
    AND dum2.baz = dum.baz
    AND dum2.moo = dum.moo
  GROUP BY foo, bar, baz, moo
  HAVING MAX(dum2.ins_date) = dum.ins_date
);
foo      | bar     | baz | moo               | ins_date            | percentage |   yes | maybe |  no
:------- | :------ | :-- | :---------------- | :------------------ | ---------: | ----: | ----: | --:
foothing | bar_one | baz | strawberry_moosse | 2019-03-17 06:37:00 |         80 | 23716 |   529 |  49
foothing | bar_one | baz | mooman_being      | 2019-04-21 10:31:00 |         70 |  3969 |    16 | 121
foothing | bar_one | baz | amoosing          | 2018-11-29 05:31:00 |         97 |  9025 |   361 |   1

db<>fiddle here