在结果集中查找最近的日期

Find the most recent date in a result set

我正在处理一个查询,我需要查看患者访问诊所时输入的患者生命体征(特别是血压)。我正在提取 2015 年全年的结果,当然有些患者多次就诊,我只需要查看最近一次就诊时输入的生命体征。另一个细微的差别是收缩压和舒张压是分开输入的,所以我最终得到的结果如下:

Patient ID     Name           DOB          Test              Results       Date
---------------------------------------------------------------------------------
1000           John Smith     1/1/1955     BP - Diastolic    120           2/10/2015
1000           John Smith     1/1/1955     BP - Systolic     70            2/10/2015
1000           John Smith     1/1/1955     BP - Diastolic    128           7/12/2015
1000           John Smith     1/1/1955     BP - Systolic     75            7/12/2015
1000           John Smith     1/1/1955     BP - Diastolic    130           10/22/2015
1000           John Smith     1/1/1955     BP - Systolic     76            10/22/2015
9999           Jane Doe       5/4/1970     BP - Diastolic    130           4/2/2015
9999           Jane Doe       5/4/1970     BP - Systolic     60            4/2/2015
9999           Jane Doe       5/4/1970     BP - Diastolic    127           11/20/2015
9999           Jane Doe       5/4/1970     BP - Systolic     65            11/20/2015

有 26,000 多个结果,所以显然我不想检查每个患者并查看他们最近的结果是什么时候。我希望我的结果看起来像这样:

Patient ID     Name           DOB          Test         Results       Date
---------------------------------------------------------------------------------
1000           John Smith     1/1/1955     BP - Diastolic    130           10/22/2015
1000           John Smith     1/1/1955     BP - Systolic     76            10/22/2015
9999           Jane Doe       5/4/1970     BP - Diastolic    127           11/20/2015
9999           Jane Doe       5/4/1970     BP - Systolic     65            11/20/2015

我知道姓名和出生日期等会重复,但我主要关注结果栏。

这是我的查询:

SELECT DISTINCT
    pd.PatientID as [Patient ID],
    pd.PatientName as Name,
    pd.DateOfBirth as DOB,
    v.Test as Test,
    v.Results as Results,
    v.TestDate as Date

FROM PatientDemographic pd JOIN Vitals v ON pd.PatientID = v.PatientID

WHERE v.TestDate BETWEEN '01/01/2015' AND '12/31/2015'
    AND v.Test LIKE 'BP%'

ORDER BY pd.PatientID, v.TestDate

在寻找其他答案后,我尝试为 SELECT 语句中的 v.TestDate 列执行 GROUP BYMAX() 聚合函数(我特别引用 this link,虽然它是针对 Oracle 的,我使用的是 SQL 服务器,所以我不完全确定语法是否相同)。我的查询看起来像:

SELECT DISTINCT
    pd.PatientID as [Patient ID],
    pd.PatientName as Name,
    pd.DateOfBirth as DOB,
    v.Test as Test,
    v.Results as Results,
    MAX(v.TestDate) as Date

FROM PatientDemographic pd JOIN Vitals v ON pd.PatientID = v.PatientID

WHERE v.TestDate BETWEEN '01/01/2015' AND '12/31/2015'
    AND v.Test LIKE 'BP%'

GROUP BY pd.PatientID

无可否认,我在使用 GROUP BY 时总是遇到一些困难。在这种特殊情况下,我收到一条错误消息,指出我也需要将 Patient Name 列添加到 GROUP BY 子句中,所以我这样做了,然后它要求提供 DOB。然后是测试名称。基本上,它希望我将 SELECT 语句中的所有内容添加到 GROUP BY.

进行最近一次患者就诊的最佳方式是什么?

一种简单的方法是使用 ROW_NUMBER() 为每个测试查找最近的记录:

SELECT pd.PatientID as [Patient ID], pd.PatientName as Name, pd.DateOfBirth as DOB,
       v.Test as Test, v.Results as Results, v.TestDate as Date
FROM PatientDemographic pd JOIN
     (SELECT v.*,
             ROW_NUMBER() OVER (PARTITION BY PatientId, Test ORDER BY TestDate DESC) as seqnum
      FROM Vitals v
      WHERE v.TestDate BETWEEN '2015-01-01' AND '2015-12-31' AND
            v.Test LIKE 'BP%'
     ) v
     ON pd.PatientID = v.PatientID 
WHERE seqnum = 1
ORDER BY pd.PatientID, v.TestDate;

我回避 Gordon 使用的 window 功能。使用子查询的技术也可以完成工作:

SELECT 
    ID
    ,Name
    ,DOB
    ,Test
    ,Results
    ,[Date]
FROM
    Vitals AS V
WHERE
    V.[Date] = (SELECT MAX([Date]) FROM Vitals W WHERE W.Name = V.Name AND W.Test = 'A')
    AND V.Test = 'A'

UNION

SELECT 
    ID
    ,Name
    ,DOB
    ,Test
    ,Results
    ,[Date]
FROM
    Vitals AS V
WHERE
    V.[Date] = (SELECT MAX([Date]) FROM Vitals W WHERE W.Name = V.Name AND W.Test = 'B')
    AND V.Test = 'B'

这是 MS SQL 2005+

SELECT * FROM (
SELECT row_number() over(partition by pd.PatientID, v.Test order by v.TestDate desc) as rn,
    pd.PatientID as [Patient ID],
    pd.PatientName as Name,
    pd.DateOfBirth as DOB,
    v.Test as Test,
    v.Results as Results,
    v.TestDate as Date
FROM PatientDemographic pd 
JOIN Vitals v ON pd.PatientID = v.PatientID
WHERE v.TestDate BETWEEN '01/01/2015' AND '12/31/2015'
    AND v.Test LIKE 'BP%') t
WHERE rn = 1

窗口函数的效率不如 NOT EXISTS 子句。我想提出一个不使用窗口函数的更快的解决方案:

SELECT 
    pd.PatientID as [Patient ID],
    pd.PatientName as Name,
    pd.DateOfBirth as DOB,
    v.Test as Test,
    v.Results as Results,
    v.TestDate as Date
FROM PatientDemographic pd JOIN Vitals v ON pd.PatientID = v.PatientID
WHERE 
    v.TestDate BETWEEN '01/01/2015' AND '12/31/2015'
    AND v.Test LIKE 'BP%'
    AND NOT EXISTS (
       SELECT 1 FROM Vitals as v2 where v2.PatientID = v.PatientID
       AND V2.TestDate BETWEEN '01/01/2015' AND '12/31/2015' 
       AND v2.Test LIKE 'BP%' 
       AND v2.TestDate > v.TestDate)

您也可以使用通用 Table 表达式来实现此目的。

        IF OBJECT_ID('tempdb..#RecentPatientVitals') IS NOT NULL
        DROP TABLE #RecentPatientVitals;
    GO

    CREATE TABLE #RecentPatientVitals
        (
          Patient_ID INT
        , Name VARCHAR(100)
        , DOB DATE
        , Test VARCHAR(150)
        , Results INT
        , [Date] DATE
        );

    INSERT  INTO #RecentPatientVitals
            ( Patient_ID, Name, DOB, Test, Results, [Date] )
    VALUES  ( 1000, 'John Smith', '1/1/1955', 'BP - Diastolic', 120, '2/10/2015' )
    ,       ( 1000, 'John Smith', '1/1/1955', 'BP - Systolic', 70, '2/10/2015' )
    ,       ( 1000, 'John Smith', '1/1/1955', 'BP - Diastolic', 128, '7/12/2015' )
    ,       ( 1000, 'John Smith', '1/1/1955', 'BP - Systolic', 75, '7/12/2015' )
    ,       ( 1000, 'John Smith', '1/1/1955', 'BP - Diastolic', 130, '10/22/2015' )
    ,       ( 1000, 'John Smith', '1/1/1955', 'BP - Systolic', 76, '10/22/2015' )
    ,       ( 9999, 'Jane Doe', '5/4/1970', 'BP - Diastolic', 130, '4/2/2015' )
    ,       ( 9999, 'Jane Doe', '5/4/1970', 'BP - Systolic', 60, '4/2/2015' )
    ,       ( 9999, 'Jane Doe', '5/4/1970', 'BP - Diastolic', 127, '11/20/2015' )
    ,       ( 9999, 'Jane Doe', '5/4/1970', 'BP - Systolic', 65, '11/20/2015' );

    SELECT  *
    FROM    #RecentPatientVitals;

    WITH    PatVitals1
              AS ( SELECT   Patient_ID
                          , Name
                          , DOB
                          , Test
                          , MAX(Date) AS Date
                   FROM     #RecentPatientVitals
                   GROUP BY Patient_ID
                          , Name
                          , DOB
                          , Test
                 ) ,
            PatVitals2
              AS ( SELECT   Patient_ID
                          , Test
                          , Results
                          , Date
                   FROM     #RecentPatientVitals
                 )
        SELECT  P1.Patient_ID
              , P1.Name
              , P1.DOB
              , P1.Test
              , P2.Results
              , P1.Date
        FROM    PatVitals1 P1
                INNER JOIN PatVitals2 P2
                ON P2.Patient_ID = P1.Patient_ID
                   AND P2.Date = P1.Date
                   AND P2.Test = P1.Test
        GROUP BY P1.Patient_ID
              , P1.Name
              , P1.DOB
              , P1.Test
              , P2.Results
              , P1.Date;