如何使用分析函数填充缺失值?

How to fill missing values using analytical functions?

我想从我的数据集中填充缺失的空值。我有这样的数据集

+---------------------+------+-------------+
| ORDER_DATE          | SHOP | SALESPERSON |
+---------------------+------+-------------+
| 14/04/2017 04:44:27 | A    | MIKE        |
+---------------------+------+-------------+
| 14/04/2017 04:44:55 | A    |             |
+---------------------+------+-------------+
| 14/04/2017 04:45:07 | A    | TIM         |
+---------------------+------+-------------+
| 14/04/2017 04:45:30 | A    |             |
+---------------------+------+-------------+
| 14/04/2017 04:45:43 | B    |             |
+---------------------+------+-------------+
| 14/04/2017 04:46:13 | B    | JOHN        |
+---------------------+------+-------------+
| 14/04/2017 04:46:28 | B    |             |
+---------------------+------+-------------+
| 14/04/2017 04:58:32 | C    |             |
+---------------------+------+-------------+
| 14/04/2017 04:58:41 | C    | MELINDA     |
+---------------------+------+-------------+

而且我喜欢用店铺内的空值之前的第一个找到的值来填充按店铺划分的销售员信息。我试过了,但这并没有产生正确的结果(如下)。如何解决?

CREATE TABLE SALES (
ORDER_DATE DATE, 
SHOP VARCHAR2(30 CHAR), 
SALESPERSON VARCHAR2(30 CHAR)
)
;

REM INSERTING INTO SALES
SET DEFINE OFF;
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:44:27','DD/MM/YYYY HH24:MI:SS'),'A','MIKE');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:44:55','DD/MM/YYYY HH24:MI:SS'),'A',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:07','DD/MM/YYYY HH24:MI:SS'),'A','TIM');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:30','DD/MM/YYYY HH24:MI:SS'),'A',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:43','DD/MM/YYYY HH24:MI:SS'),'B',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:46:13','DD/MM/YYYY HH24:MI:SS'),'B','JOHN');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:46:28','DD/MM/YYYY HH24:MI:SS'),'B',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:58:32','DD/MM/YYYY HH24:MI:SS'),'C',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:58:41','DD/MM/YYYY HH24:MI:SS'),'C','MELINDA');
COMMIT;

SELECT * FROM SALES ORDER BY SHOP, ORDER_DATE;

SELECT ORDER_DATE,
       SHOP,
       SALESPERSON,
       /*tried two approaches*/
       /*does not produce a correct result set*/
       LAST_VALUE(SALESPERSON) IGNORE NULLS OVER (PARTITION BY SHOP
                   ORDER BY ORDER_DATE RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LAST_VALUE_1,
       /*this also does not solve this*/            
       LAST_VALUE(SALESPERSON) IGNORE NULLS OVER(PARTITION BY SHOP
                  ORDER BY ORDER_DATE ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS LAST_VALUE_2
FROM SALES ;

正确的结果集是:

+---------------------+------+-------------+--------------------+
| ORDER_DATE          | SHOP | SALESPERSON | SALESPERSON_FILLED |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:44:27 | A    | MIKE        |  MIKE              |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:44:55 | A    |             |  MIKE              |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:07 | A    | TIM         |  TIM               |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:30 | A    |             |  TIM               |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:43 | B    |             |                    |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:46:13 | B    | JOHN        |  JOHN              |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:46:28 | B    |             |  JOHN              |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:58:32 | C    |             |                    |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:58:41 | C    | MELINDA     |  MELINDA           |
+---------------------+------+-------------+--------------------+

以下查询在 SQL 服务器中有效。我看不出它在 Oracle 中不起作用的任何原因:

SELECT ORDER_DATE, SHOP, 
       MAX(SALESPERSON) OVER (PARTITION BY SHOP, grp) AS SALESPERSON
FROM (
   SELECT ORDER_DATE, SHOP, SALESPERSON,
          SUM(CASE WHEN SALESPERSON IS NOT NULL THEN 1 END) 
          OVER
          (PARTITION BY SHOP ORDER BY ORDER_DATE) AS grp
   FROM mytable) AS t
ORDER BY ORDER_DATE

这是内部查询产生的结果:

ORDER_DATE              SHOP SALESPERSON  grp
---------------------------------------------
2017-04-14 04:44:27.000 A    MIKE         1
2017-04-14 04:44:55.000 A    NULL         1
2017-04-14 04:45:07.000 A    TIM          2
2017-04-14 04:45:30.000 A    NULL         2
2017-04-14 04:45:43.000 B    NULL         NULL
2017-04-14 04:46:13.000 B    JOHN         1
2017-04-14 04:46:28.000 B    NULL         1
2017-04-14 04:58:32.000 C    NULL         NULL
2017-04-14 04:58:41.000 C    MELINDA      1

因此,使用字段 grp 和字段 SHOP,我们可以识别 'island' 应该共享相同 SALESPERSON 值的记录。

你们非常亲密。
试试这个:

SELECT ORDER_DATE,
       SHOP,
       SALESPERSON,

       LAST_VALUE(SALESPERSON) IGNORE NULLS OVER 
            (PARTITION BY SHOP ORDER BY ORDER_DATE ) AS LAST_VALUE_1

FROM SALES
order by shop, order_date;

ORDER_DA SHOP                           SALESPERSON                    LAST_VALUE_1                  
-------- ------------------------------ ------------------------------ ------------------------------
17/04/14 A                              MIKE                           MIKE                          
17/04/14 A                                                             MIKE                          
17/04/14 A                              TIM                            TIM                           
17/04/14 A                                                             TIM                           
17/04/14 B                                                                                           
17/04/14 B                              JOHN                           JOHN                          
17/04/14 B                                                             JOHN                          
17/04/14 C                                                                                           
17/04/14 C                              MELINDA                        MELINDA                       

9 rows selected. 

在 SQL 服务器 2008 中, 使用 CASE WHEN, ORDER BY :

代码:

 SELECT CONVERT(DATETIME,ORDER_DATE) AS ORDER_DATE,
    ISNULL(SHOP,'') AS SHOP,
    ISNULL(SALESPERSON,'') AS SALESPERSON,
    CASE WHEN SALESPERSON IS NULL OR SALESPERSON = '' THEN 
    ISNULL((SELECT TOP 1 ISNULL(SALESPERSON,'')  FROM SALES_stack WHERE ORDER_DATE < S.ORDER_DATE
    AND ISNULL(SHOP,'') = ISNULL(S.SHOP,'') 
    ORDER BY ORDER_DATE DESC),'')
    ELSE  ISNULL(SALESPERSON,'') END AS SALESPERSON_FILLED FROM SALES_stack S

输出:

    ORDER_DATE              SHOP   SALESPERSON  SALESPERSON_FILLED
    2017-04-14 04:44:27.000 A      MIKE         MIKE
    2017-04-14 04:44:55.000 A                   MIKE
    2017-04-14 04:45:07.000 A      TIM          TIM
    2017-04-14 04:45:30.000 A                   TIM
    2017-04-14 04:45:43.000 B       
    2017-04-14 04:46:13.000 B      JOHN         JOHN
    2017-04-14 04:46:28.000 B                   JOHN
    2017-04-14 04:58:32.000 C       
    2017-04-14 04:58:41.000 C      MELINDA      MELINDA