问:SQLite 查询查找数据中的孤岛(即值达到特定阈值的连续行)
Q: SQLite query to find islands in data (that is, sequential rows where a value has met a certain threshold)
我有一个 SQLite 数据库,比方说,随着时间的推移在癌细胞中发现的死细胞百分比(注意:时间列值已更改为简单数字以便于阅读)。
id time deadcellspercent
1 000000001000000000 35
2 000000002000000000 54
3 000000003000000000 31
4 000000004000000000 15
5 000000005000000000 38
6 000000006000000000 70
7 000000007000000000 28
8 000000008000000000 13
9 000000009000000000 99
10 000000010000000000 51
我想创建一个 SQLite 查询,它将 return 百分比超过特定阈值的时间范围。比如我说我要阈值>=20,那么查询结果应该是return:
ts_start ts_end
000000001000000000 000000003000000000
000000005000000000 000000007000000000
000000009000000000 000000010000000000
如何形成查询来执行此操作?我阅读了 "SQLite window functions"、"gaps and islands problem" 和 "analytic functions" 等主题,但我是一个 SQL 新手,无法弄清楚它的正面或反面以获得期望的结果。
如有任何帮助,我们将不胜感激。
您在 window 功能、差距和孤岛方面走在了正确的轨道上。
首先,让我们获取您的示例数据并用它填充 table:
CREATE TABLE cells(id INTEGER PRIMARY KEY, time TEXT, deadcellspercent INTEGER);
INSERT INTO cells VALUES(1,'000000001000000000',35);
INSERT INTO cells VALUES(2,'000000002000000000',54);
INSERT INTO cells VALUES(3,'000000003000000000',31);
INSERT INTO cells VALUES(4,'000000004000000000',15);
INSERT INTO cells VALUES(5,'000000005000000000',38);
INSERT INTO cells VALUES(6,'000000006000000000',70);
INSERT INTO cells VALUES(7,'000000007000000000',28);
INSERT INTO cells VALUES(8,'000000008000000000',13);
INSERT INTO cells VALUES(9,'000000009000000000',99);
INSERT INTO cells VALUES(10,'000000010000000000',51);
一个可能的查询(使用 window 函数,因此需要最新版本的 sqlite - 3.25 或更高版本):
WITH islands AS (SELECT id, time
, row_number() OVER w1 - row_number() OVER w2 AS diff
, deadcellspercent >= 20 AS wanted
FROM cells
WINDOW w1 AS (ORDER BY time)
, w2 AS (PARTITION BY deadcellspercent >= 20 ORDER BY time))
SELECT min(time) AS ts_start, max(time) AS ts_end
FROM islands
WHERE wanted = 1
GROUP BY diff
ORDER BY diff;
产生:
ts_start ts_end
------------------ ------------------
000000001000000000 000000003000000000
000000005000000000 000000007000000000
000000009000000000 000000010000000000
(受 DBA stackexchange 上 this post 的严重影响;请参阅它以获取解释)。
我有一个 SQLite 数据库,比方说,随着时间的推移在癌细胞中发现的死细胞百分比(注意:时间列值已更改为简单数字以便于阅读)。
id time deadcellspercent
1 000000001000000000 35
2 000000002000000000 54
3 000000003000000000 31
4 000000004000000000 15
5 000000005000000000 38
6 000000006000000000 70
7 000000007000000000 28
8 000000008000000000 13
9 000000009000000000 99
10 000000010000000000 51
我想创建一个 SQLite 查询,它将 return 百分比超过特定阈值的时间范围。比如我说我要阈值>=20,那么查询结果应该是return:
ts_start ts_end
000000001000000000 000000003000000000
000000005000000000 000000007000000000
000000009000000000 000000010000000000
如何形成查询来执行此操作?我阅读了 "SQLite window functions"、"gaps and islands problem" 和 "analytic functions" 等主题,但我是一个 SQL 新手,无法弄清楚它的正面或反面以获得期望的结果。
如有任何帮助,我们将不胜感激。
您在 window 功能、差距和孤岛方面走在了正确的轨道上。
首先,让我们获取您的示例数据并用它填充 table:
CREATE TABLE cells(id INTEGER PRIMARY KEY, time TEXT, deadcellspercent INTEGER);
INSERT INTO cells VALUES(1,'000000001000000000',35);
INSERT INTO cells VALUES(2,'000000002000000000',54);
INSERT INTO cells VALUES(3,'000000003000000000',31);
INSERT INTO cells VALUES(4,'000000004000000000',15);
INSERT INTO cells VALUES(5,'000000005000000000',38);
INSERT INTO cells VALUES(6,'000000006000000000',70);
INSERT INTO cells VALUES(7,'000000007000000000',28);
INSERT INTO cells VALUES(8,'000000008000000000',13);
INSERT INTO cells VALUES(9,'000000009000000000',99);
INSERT INTO cells VALUES(10,'000000010000000000',51);
一个可能的查询(使用 window 函数,因此需要最新版本的 sqlite - 3.25 或更高版本):
WITH islands AS (SELECT id, time
, row_number() OVER w1 - row_number() OVER w2 AS diff
, deadcellspercent >= 20 AS wanted
FROM cells
WINDOW w1 AS (ORDER BY time)
, w2 AS (PARTITION BY deadcellspercent >= 20 ORDER BY time))
SELECT min(time) AS ts_start, max(time) AS ts_end
FROM islands
WHERE wanted = 1
GROUP BY diff
ORDER BY diff;
产生:
ts_start ts_end
------------------ ------------------
000000001000000000 000000003000000000
000000005000000000 000000007000000000
000000009000000000 000000010000000000
(受 DBA stackexchange 上 this post 的严重影响;请参阅它以获取解释)。