痛苦缓慢的查询,我有什么选择?
Painfully slow query, what are my options?
我在医院做一些 sql 工作(还没有 COVID 病例!)。有一个 table、[dbo].A.diagnosis
,其中包含我们所有患者的所有诊断的历史记录。我不是专家,但 table 是......不好。它被我们在这里用来处理诊断(以及其他事情)的这个古老的软件使用。 table 的工作方式是,它有 30 多列和 300k 多行,但没有索引(主键除外)。每次患者更新诊断时,他们的所有诊断都会被重写到新 diagnosis_date
下的 table 中。 diagnosis_date
存储在数据类型 date
而不是 datetime
,但患者在一天内多次更新诊断并不少见。
我需要获得我们所有当前入院患者的列表,并使其合理更新(我想说在过去 24 小时内是合理的,但越早越好)。
我目前的最佳查询在 运行 时间内仍然变化很大,从 1 到 15(!!!) 分钟到 运行 不等。那不是 acceptable,所以我想知道我的选择是什么来改进它。
数据样本(虚构,仅相关列):
-- [dbo].A.diagnosis
+------------+----------------+----------------+----------------+-----------------------------+
| patient_id | diagnosis_type | diagnosis_date | diagnosis_code | diagnosis_text |
+------------+----------------+----------------+----------------+-----------------------------+
| 0369344991 | I | 2020-01-04 | E669 | Obesity, unspecified |
| 0369344991 | I | 2020-01-04 | M545 | Low back pain |
| 0369344991 | I | 2020-01-04 | NULL | NULL | -- Separator
| 0369344991 | U | 2020-01-04 | E669 | Obesity, unspecified |
| 0369344991 | U | 2020-01-04 | M545 | Low back pain |
| 0369344991 | U | 2020-01-04 | L709 | Acne, unspecified | -- Updated later that day to add the acne diagnosis
| 0369344991 | U | 2020-01-04 | NULL | NULL |
| 0369344991 | U | 2020-01-16 | E669 | Obesity, unspecified |
| 0369344991 | U | 2020-01-16 | L709 | Acne, unspecified |
| 0369344991 | U | 2020-01-16 | E785 | Hyperlipidemia, unspecified | -- Updated 12 days later, low back pain resolved, added hyperlipidemia
| 0369344991 | U | 2020-01-16 | NULL | NULL |
+------------+----------------+----------------+----------------+-----------------------------+
-- [dbo].A.patients
+------------+
| patient_id |
+------------+
| 0369344991 |
+------------+
-- [dbo].B.diagnosis_priority
+----------------+--------------------+
| diagnosis_type | diagnosis_priority |
+----------------+--------------------+
| I | 1 |
| A | 2 |
| U | 3 |
| D | 4 |
+----------------+--------------------+
查询:
SELECT DISTINCT dx.patient_id -- (decimal(10,0), null)
, dx.diagnosis_date -- (date, null)
, dx.diagnosis_code -- (varchar(5), null)
, dx.diagnosis_text -- (varchar(253, null)
, dx.diagnosis_type -- (varchar(1), null)
FROM [dbo].A.patients -- Starting with a list of our current patients.
JOIN [dbo].A.diagnosis dx
ON [dbo].A.patients.patient_id = dx.patient_id
JOIN [dbo].B.diagnosis_priority dp
ON dx.diagnosis_type = dp.diagnosis_type
-- This is a table I wrote to help determine which diagnoses are more 'up-to-date' if multiple updates are done on
-- a single day. The join assigns a priority number to each diagnosis_type as diagnosis_priority.
WHERE dx.diagnosis_code IS NOT NULL
AND dx.diagnosis_date = ( -- Trying to get the diagnoses as of the most recent diagnosis date.
SELECT MAX(dx_a.diagnosis_date)
FROM [dbo].A.diagnosis dx_a
WHERE dx_a.patient_id = dx.patient_id
)
AND dp.diagnosis_priority = (
-- Trying to get the highest priority diagnoses applied on the most recent date.
-- A patient will not get a lower priority diagnosis on a later date, but newer diagnoses will not
-- necessarily get a higher priority in [dbo].A.diagnosis
SELECT MAX(dp_a.diagnosis_priority)
FROM [dbo].A.diagnosis dx_a
JOIN [dbo].B.diagnosis_priority dp_a
ON dx_a.diagnosis_type = dp_a.diagnosis_type
WHERE dx_a.patient_id = dx.patient_id
)
我是 [dbo].A
上 db_datareader
的成员,但我是同一服务器上 [dbo].B
上 db_owner
的成员。修改[dbo].A.diagnosis
函数的方式是不可行的,因为前面提到的一个古老的软件。
如果不能显着改进查询,我想知道我有什么选择 [dbo].B
来维护目前在医院的患者的当前诊断列表。
将所有数据流式传输到临时 table 并且 运行 您对临时 table 的查询。
CREATE TABLE #diagnosis_tmp (patient_id decimal(10,0), diagnosis_type varchar(1), diagnosis_date date, diagnosis_code varchar(5), diagnosis_text varchar(253))
INSERT INTO #diagnosis_tmp (patient_id,diagnosis_type,diagnosis_date,diagnosis_code)
SELECT patient_id,diagnosis_type,diagnosis_date,diagnosis_code
FROM [dbo].A.diagnosis
WHERE diagnosis_code IS NOT NULL
--CREATE INDEX i_patient_date ON #diagnosis_tmp (patient_id,diagnosis_date)
我在医院做一些 sql 工作(还没有 COVID 病例!)。有一个 table、[dbo].A.diagnosis
,其中包含我们所有患者的所有诊断的历史记录。我不是专家,但 table 是......不好。它被我们在这里用来处理诊断(以及其他事情)的这个古老的软件使用。 table 的工作方式是,它有 30 多列和 300k 多行,但没有索引(主键除外)。每次患者更新诊断时,他们的所有诊断都会被重写到新 diagnosis_date
下的 table 中。 diagnosis_date
存储在数据类型 date
而不是 datetime
,但患者在一天内多次更新诊断并不少见。
我需要获得我们所有当前入院患者的列表,并使其合理更新(我想说在过去 24 小时内是合理的,但越早越好)。
我目前的最佳查询在 运行 时间内仍然变化很大,从 1 到 15(!!!) 分钟到 运行 不等。那不是 acceptable,所以我想知道我的选择是什么来改进它。
数据样本(虚构,仅相关列):
-- [dbo].A.diagnosis
+------------+----------------+----------------+----------------+-----------------------------+
| patient_id | diagnosis_type | diagnosis_date | diagnosis_code | diagnosis_text |
+------------+----------------+----------------+----------------+-----------------------------+
| 0369344991 | I | 2020-01-04 | E669 | Obesity, unspecified |
| 0369344991 | I | 2020-01-04 | M545 | Low back pain |
| 0369344991 | I | 2020-01-04 | NULL | NULL | -- Separator
| 0369344991 | U | 2020-01-04 | E669 | Obesity, unspecified |
| 0369344991 | U | 2020-01-04 | M545 | Low back pain |
| 0369344991 | U | 2020-01-04 | L709 | Acne, unspecified | -- Updated later that day to add the acne diagnosis
| 0369344991 | U | 2020-01-04 | NULL | NULL |
| 0369344991 | U | 2020-01-16 | E669 | Obesity, unspecified |
| 0369344991 | U | 2020-01-16 | L709 | Acne, unspecified |
| 0369344991 | U | 2020-01-16 | E785 | Hyperlipidemia, unspecified | -- Updated 12 days later, low back pain resolved, added hyperlipidemia
| 0369344991 | U | 2020-01-16 | NULL | NULL |
+------------+----------------+----------------+----------------+-----------------------------+
-- [dbo].A.patients
+------------+
| patient_id |
+------------+
| 0369344991 |
+------------+
-- [dbo].B.diagnosis_priority
+----------------+--------------------+
| diagnosis_type | diagnosis_priority |
+----------------+--------------------+
| I | 1 |
| A | 2 |
| U | 3 |
| D | 4 |
+----------------+--------------------+
查询:
SELECT DISTINCT dx.patient_id -- (decimal(10,0), null)
, dx.diagnosis_date -- (date, null)
, dx.diagnosis_code -- (varchar(5), null)
, dx.diagnosis_text -- (varchar(253, null)
, dx.diagnosis_type -- (varchar(1), null)
FROM [dbo].A.patients -- Starting with a list of our current patients.
JOIN [dbo].A.diagnosis dx
ON [dbo].A.patients.patient_id = dx.patient_id
JOIN [dbo].B.diagnosis_priority dp
ON dx.diagnosis_type = dp.diagnosis_type
-- This is a table I wrote to help determine which diagnoses are more 'up-to-date' if multiple updates are done on
-- a single day. The join assigns a priority number to each diagnosis_type as diagnosis_priority.
WHERE dx.diagnosis_code IS NOT NULL
AND dx.diagnosis_date = ( -- Trying to get the diagnoses as of the most recent diagnosis date.
SELECT MAX(dx_a.diagnosis_date)
FROM [dbo].A.diagnosis dx_a
WHERE dx_a.patient_id = dx.patient_id
)
AND dp.diagnosis_priority = (
-- Trying to get the highest priority diagnoses applied on the most recent date.
-- A patient will not get a lower priority diagnosis on a later date, but newer diagnoses will not
-- necessarily get a higher priority in [dbo].A.diagnosis
SELECT MAX(dp_a.diagnosis_priority)
FROM [dbo].A.diagnosis dx_a
JOIN [dbo].B.diagnosis_priority dp_a
ON dx_a.diagnosis_type = dp_a.diagnosis_type
WHERE dx_a.patient_id = dx.patient_id
)
我是 [dbo].A
上 db_datareader
的成员,但我是同一服务器上 [dbo].B
上 db_owner
的成员。修改[dbo].A.diagnosis
函数的方式是不可行的,因为前面提到的一个古老的软件。
如果不能显着改进查询,我想知道我有什么选择 [dbo].B
来维护目前在医院的患者的当前诊断列表。
将所有数据流式传输到临时 table 并且 运行 您对临时 table 的查询。
CREATE TABLE #diagnosis_tmp (patient_id decimal(10,0), diagnosis_type varchar(1), diagnosis_date date, diagnosis_code varchar(5), diagnosis_text varchar(253))
INSERT INTO #diagnosis_tmp (patient_id,diagnosis_type,diagnosis_date,diagnosis_code)
SELECT patient_id,diagnosis_type,diagnosis_date,diagnosis_code
FROM [dbo].A.diagnosis
WHERE diagnosis_code IS NOT NULL
--CREATE INDEX i_patient_date ON #diagnosis_tmp (patient_id,diagnosis_date)