像 STATS_MODE 这样的 Oracle 统计避免了扩展到另一列的默认行为

Oracle statistics like STATS_MODE avoiding the default behavior extending to another column

查询以从日期列表中提取最流行的时间。

Table: ID parent_id 开始 时长

select STATS_MODE(extract(HOUR from started)) as most_pop_call_start, 
       avg(duration) as avg_duration 
from table where parent_id = 'xxx';

这很好用,但如果我们有相同提取时间(小时)的记录,STATS_MODE 将默认采用最小值。

相反,在没有唯一结果的极端情况下,我想同时使用持续时间来扩展过滤器。

例如

| **ID** | **PARENT_ID** | **STARTED** | **DURATION** |
| test_01 | P_1 | 2017-01-12 10:21:53.000000 | 32 |
| test_02 | P_1 | 2017-01-12 10:22:53.000000 | 50 |
| test_03 | P_1 | 2017-01-12 11:23:53.000000 | 19 |
| test_04 | P_1 | 2017-01-12 11:24:53.000000 | 39 |
| test_05 | P_1 | 2017-01-12 12:25:53.000000 | 49 |
| test_06 | P_1 | 2017-01-12 12:26:53.000000 | 59 |
| test_07 | P_1 | 2017-01-12 13:27:53.000000 | 69 |
| test_08 | P_1 | 2017-01-12 13:28:53.000000 | 79 |
| test_09 | P_1 | 2017-01-12 14:29:53.000000 | 98 |
| test_10 | P_1 | 2017-01-12 15:30:53.000000 | 99 |

在这种情况下,我希望有 most_pop_call_start 值“13”,因为最大计数 (*) 按提取物分组 (HOUR) return 2 和更多然后 1 组记录有 2 作为计数,所以我将评估持续时间列,将 79 作为子集的最大值 (10, 11, 12, 13)。

尝试:

select min( hr ) keep ( dense_rank last order by cnt,max_duration ) as most_pop_call_start, 
       sum( duration ) / sum( cnt ) as avg_duration 
from (
    select extract(HOUR from started) as hr,
           count(*)  as cnt,
           sum( duration ) as duration,
           max( duration )  as max_duration
    from table
    where parent_id = 'xxx'
    group by extract(HOUR from started)
);

我从 krokodilko 提案开始解决了。

这是整个查询:

SELECT tp.d_id , u.email , ta.email_subject AS headline , ta.start_date , ta.closing_date , tp.call_cost , SUM(tpc.duration) AS duration , COUNT(tpc.id) AS call_count , ta.currency
  , ( SELECT MIN(hr) KEEP ( DENSE_RANK LAST ORDER BY cnt, max_duration, l_id ) AS most_pop_call_start FROM (
        SELECT EXTRACT(HOUR FROM started) AS hr, count(*) AS cnt , SUM(duration) AS duration , MAX(duration) AS max_duration , phone_number_id AS l_id FROM table_phone_call GROUP BY EXTRACT(HOUR FROM started), phone_number_id
      ) where l_id = tpc.phone_number_id
    ) AS most_pop_call_start
  --, STATS_MODE(extract(HOUR from tpc.started)) AS most_pop_call_start
  , AVG(tpc.duration) AS avg_duration
  , tb.name AS business_name
FROM table_a ta
  JOIN table_phone tp ON ta.id = tp.d_id
  JOIN table_phone_call tpc ON tp.phone_id = tpc.phone_number_id
  JOIN table_b tb ON ta.business_id = tb.id
  JOIN users u ON tb.id = u.business_id
WHERE phone_id IS NOT NULL
      AND TRUNC(ta.closing_date) = to_date('#{jobParameters['dateCriteria']}', 'dd-mm-yyyy')
GROUP BY tp.d_id , ta.email_subject , ta.start_date , ta.closing_date , tp.call_cost , tpc.phone_number_id , u.email , ta.currency , tb.name

非常感谢