结合 COUNT 和 RANK - PostgreSQL

Question

我需要 select 的是 table 用户的每个 'id_customer' 的总行程次数及其 ID dispatch_seconds 以及第一笔订单的距离。 id_customer、customer_id 和 order_id 是字符串。

应该是这样的

+------+--------+------------+--------------------------+------------------+
|  id  | count  | #1order id | #1order dispatch seconds | #1order distance |
+------+--------+------------+--------------------------+------------------+
| 1ar5 |      3 | 4r56       |                        1 |              500 |
| 2et7 |      2 | dc1f       |                        5 |              100 |
+------+--------+------------+--------------------------+------------------+

干杯！

原文post 是在讨论过程中编辑的，S-man 帮助我找到了确切的问题解决方案。 S-man的解答https://dbfiddle.uk/?rdbms=postgres_10&fiddle=e16aa6008990107e55a26d05b10b02b5

Answer 1

您可以使用window函数：

select distinct customer_id, 
       count(*) over (partition by customer_id) as no_of_order
       min(order_id) over (partition by customer_id order by order_timestamp) as first_order_id
from orders o;

Answer 2

我认为您的原始查询中有很多错误，您的排名没有分区，order by 子句似乎不正确，您过滤掉了除一个 "random" 订单之外的所有订单，然后应用计数，列表继续。

这样的东西似乎更接近您想要的东西？

SELECT
    customer_id,
    order_count,
    order_id
FROM (
    SELECT
        a.customer_id,
        a.order_count,
        a.order_id,
        RANK() OVER (PARTITION BY a.order_id, a.customer_id ORDER BY a.order_count DESC) AS rank_id
    FROM (
        SELECT
            customer_id,
            order_id,
            COUNT(*) AS order_count
        FROM 
            orders
        GROUP BY
            customer_id,
            order_id) a) b
WHERE 
    b.rank_id = 1;

Answer 3

db<>fiddle

SELECT 
    customer_id,
    order_id,
    order_timestamp,
    dispatch_seconds,
    distance
FROM (
    SELECT 
        *, 
        count(*) over (partition by customer_id),    -- A
        first_value(order_id) over (partition by customer_id order by order_timestamp) -- B
    FROM orders
)s

WHERE order_id = first_value -- C

https://www.postgresql.org/docs/current/static/tutorial-window.html

一个window函数获取每个用户的总记录数

B window 函数，它按时间戳对每个用户的所有记录进行排序，并给出相应用户的第一个 order_id。使用 first_value 而不是 min 有一个好处：也许您的订单 ID 并没有真正按时间戳增加（也许两个订单同时进来，或者您的订单 ID 不是连续增加而是某种形式哈希）

--> 都是新列

C 现在获取所有列，其中 "first_value"（又名时间戳的第一个 order_id）等于当前行的 order_id。这给出了用户第一个订单的所有行。

结果：

customer_id  count  order_id  order_timestamp      dispatch_seconds  distance  
-----------  -----  --------  -------------------  ----------------  --------  
1ar5         3      4r56      2018-08-16 17:24:00  1                 500       
2et7         2      dc1f      2018-08-15 01:24:00  5                 100

请注意，在这些测试数据中，用户“2et7”的顺序 "dc1f" 具有较小的时间戳，但排在后面。这不是 table 中用户的第一次出现，但仍然是最早订购的用户。这应该证明 first_value 与 min 的情况，如上所述。

Answer 4

你走在正确的轨道上。只使用条件聚合：

SELECT o.customer_id, COUNT(*)
       MAX(CASE WHEN seqnum = 1 THEN o.order_id END) as first_order_id
FROM (SELECT o.*,
             ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_timestamp ASC) as seqnum
      FROM orders o
     ) o
GROUP BY o.customer_id;

您的 JOIN 对于此查询不是必需的。

结合 COUNT 和 RANK - PostgreSQL

Combining COUNT and RANK - PostgreSQL

sql

postgresql

rank