获取多个分区的潜在客户价值

Get Lead value over multiple partitions

我有一个问题,我觉得可以使用 lag/lead + 分区来解决,但我无法解决这个问题。

每两年(大约)邀请客户参与研究项目。 每个项目都会选择一些客户。 一些客户被选中进行多个研究项目。 那些收到邀请。在某些情况下,不会发送邀请。如果客户对邀请没有反应,则会发送第二份邀请(提醒)。第3个,第4个也可以。

我需要了解客户是否收到过之前研究项目的邀请。 (并且可以选择是哪个邀请)。

数据集如下所示:

clientID | projectID | invitationID
  14     |    267    |     489
  14     |    267    |     325
  16     |    385    |     475
  17     |    546    |     NULL
  17     |    547    |     885
  17     |    548    |     901
  18     |    721    |     905
  18     |    834    |     906
  18     |    834    |     907
  19     |    856    |     908
  19     |    856    |     929
  19     |    857    |     931
  19     |    857    |     945
  19     |    858    |     NULL


Client 14 has had 2 invitations for the same research-project
Client 16 has had 1 invitation for 1 research-project
Client 17 has been selected for 3 research-projects but opted out for project 546, receiving 1 invitation each for the following projects. 
Client 18 has been selected for 2 research-projects. For the second project he got a 2 invitations.
Client 19 has been selected for three research-projects. For the first two a reminder was set. Client 19 was selected for project 858 but opted out thus no invitation.

现在我需要确定每个客户是否收到了以前研究项目的邀请。 (可选地,那是哪个邀请)。我只需要第一个邀请(如果有多个)。 所以我得到的数据集应该是这样的(括号内的内容是可选的):

clientID | projectID | invitationID | InvitedForPreviousProject
  14     |    267    |     489      |      0
  14     |    267    |     325      |      0
  16     |    385    |     475      |      0
  17     |    546    |     NULL     |      0
  17     |    547    |     885      |      0
  17     |    548    |     901      |      1 (885)
  18     |    721    |     905      |      0
  18     |    834    |     906      |      1 (905)
  18     |    834    |     907      |      1 (905)
  19     |    856    |     908      |      0
  19     |    856    |     929      |      0
  19     |    857    |     931      |      1 (908)
  19     |    857    |     945      |      1 (908)
  19     |    858    |     NULL     |      1 (931)

这可以使用 LEAD、Rank、Dense-Rank 来完成吗?创建语句包括以下数据

declare @table table (
    [clientID] [int] NULL,
    [projectID] [int] NULL,
    [invitationID] [int] NULL
)
INSERT @table ([clientID], [projectID], [invitationID]) VALUES
(14, 267, 489),
(14, 267, 325),
(16, 385, 475),
(17, 546, NULL),
(17, 547, 885),
(17, 548, 901),
(18, 721, 905),
(18, 834, 906),
(18, 834, 907),
(19, 856, 908),
(19, 856, 929),
(19, 857, 931),
(19, 857, 945),
(19, 858, NULL)

这有帮助吗?

declare @table table (
    [clientID] [int] NULL,
    [projectID] [int] NULL,
    [invitationID] [int] NULL
)
INSERT @table ([clientID], [projectID], [invitationID]) VALUES
(14, 267, 489),
(14, 267, 325),
(16, 385, 475),
(17, 546, NULL),
(17, 547, 885),
(17, 548, 901),
(18, 721, 905),
(18, 834, 906),
(18, 834, 907),
(19, 856, 908),
(19, 856, 929),
(19, 857, 931),
(19, 857, 945),
(19, 858, NULL);

--查询使用DENSE_RANK()和一个相关的子查询

WITH ranked AS
(
    SELECT t.* 
         ,DENSE_RANK() OVER(PARTITION BY t.clientID ORDER BY t.projectID) AS InvRank
    FROM @table t
)
SELECT r.*
      ,earlierProject.invitationID
FROM ranked r
OUTER APPLY(SELECT TOP 1 *
            FROM ranked r2 
            WHERE r2.clientID=r.clientID
             AND  r2.projectID<r.projectID 
             AND  r2.InvRank=r.InvRank-1   
            ORDER BY invitationID ASC
            ) earlierProject
ORDER BY r.clientID,r.projectID,r.invitationID;

如果您的 table 中为“0”,invitationID 将为 NULL,如果找到项目,则设置为所需的值。

提示

实际上 APPLY 没有必要。如果您只需要 invitationID ,您可以将子查询直接作为列放置(稍微快一些)。但这更好读,您也可以阅读其他专栏...

您需要一个指定顺序的列。让我假设有一个邀请日期以及其他列。

根据这些信息,通过比较两个值可以轻松计算出您的标志:

  • 客户的最短邀请日期
  • client/project id
  • 的最短邀请日期

当这些相同时,这是第一个有邀请的项目。

所以:

select t.*,
       (case when min(invitationDate) over (partition by clientId order by invitationDate) =
                  min(invitationDate) over (partition by clientId, projectId order by invitationDate)
             then 0 else 1                  
        end) as InvitedForPreviousProject
from @table t;