使用 LEAD/LAG 从 select 查询创建 table
CREATE a table from a select query with LEAD/LAG
这是table我手边的那种:
SELECT * FROM smf_table LIMIT 20;
id | trip_id | segment_id | segment_start_timestamp | timestamp | lat | lon | travelmode
---------+---------+------------+-------------------------+------------+-------------+-------------+------------
5338113 | 533811 | 3 | 2016-04-01 00:47:16+01 | 1459467971 | 41.1523521 | -8.6097233 | 0
5338113 | 533811 | 3 | 2016-04-01 00:47:16+01 | 1459468020 | 41.1523518 | -8.6097168 | 0
5338113 | 533811 | 3 | 2016-04-01 00:47:16+01 | 1459468026 | 41.1524153 | -8.6097054 | 0
5338113 | 533811 | 3 | 2016-04-01 00:47:16+01 | 1459468031 | 41.1524057 | -8.609701 | 0
5338113 | 533811 | 3 | 2016-04-01 00:47:16+01 | 1459468036 | 41.1523647 | -8.6097146 | 0
5338113 | 533811 | 3 | 2016-04-01 00:47:16+01 | 1459468041 | 41.1525607 | -8.6096725 | 0
5338113 | 533811 | 3 | 2016-04-01 00:47:16+01 | 1459468046 | 41.1525077 | -8.6096843 | 0
5338113 | 533811 | 3 | 2016-04-01 00:47:16+01 | 1459468051 | 41.1524966 | -8.6096833 | 0
5338151 | 533815 | 1 | 2016-04-01 00:06:40+01 | 1459465282 | 41.14454009 | -8.56292593 | 3
5338151 | 533815 | 1 | 2016-04-01 00:06:40+01 | 1459465412 | 41.14454 | -8.5629259 | 3
5338151 | 533815 | 1 | 2016-04-01 00:06:40+01 | 1459465600 | 41.163172 | -8.5838214 | 3
这是一个包含超过 1 亿行的大型 table。我想用 smf_table
的过滤结果创建新的 table temp_table
,这样在新的 table:
- 不包括
travelmode
列 IS NULL
的行(有很多)
- 不包括
row2_timestamp - row1_timestamp = 0
. 的行
所以我想到了这样使用子查询:
CREATE TABLE temp_table
AS
WITH cte AS
(SELECT LEAD(timestamp) OVER (PARTITION BY id ORDER BY timestamp)
- LAG(timestamp) OVER (PARTITION BY id ORDER BY timestamp)
FROM smf_table
)
SELECT id,
lat,
lon,
timestamp,
travel mode
FROM smf_table
WHERE travelmode IS NOT NULL AND cte !=0;
ERROR: relation "smf_table" does not exist
LINE 13: FROM smf_table
你不应该收到 smf_table
未定义的错误。您可能会遇到其他错误——cte
未定义,CTE 中的列没有名称,travel
未定义。
您需要从 CTE select 才能使用其中的列。 CTE 类似于 tables/views,而不是列:
WITH cte AS (
SELECT s.*,
LEAD(timestamp) OVER (PARTITION BY id ORDER BY timestamp) - LAG(timestamp) OVER (PARTITION BY id ORDER BY timestamp) as diff
FROM smf_table s
)
SELECT id lat, lon, timestamp, travelmode
FROM cte
WHERE travelmode IS NOT NULL AND diff <> 0;
您必须 select 来自 table 的所需列并执行操作,而当 select 计算最终结果时,您必须 select 来自 cte 而不是原创table。您也可以直接创建 table,如下所示,无需递归查询。
CREATE TABLE temp_table as
SELECT SELECT LEAD(timestamp) OVER (PARTITION BY id ORDER BY timestamp)
- LAG(timestamp) OVER (PARTITION BY id ORDER BY timestamp) as date_time , id,
lat,
lon,
timestamp,
travel mode
FROM smf_table
WHERE travelmode IS NOT NULL AND cte !=0;
为什么要从 LEAD()
中减去 LAG()
?您的意思是忽略当前记录,将后面的记录与前面的记录进行比较吗?
另外,如果 timestamp1 - timestamp2 = 0
那么 timestamp1 = timestamp2
,所以这可以用 group by
来解决。
CREATE TABLE temp_table
AS
SELECT id,
max(lat) as lat,
max(lon) as lon,
timestamp,
max(travelmode) as travelmode
FROM smf_table
WHERE travelmode IS NOT NULL
GROUP by id, timestamp
HAVING count(*) = 1
这是table我手边的那种:
SELECT * FROM smf_table LIMIT 20;
id | trip_id | segment_id | segment_start_timestamp | timestamp | lat | lon | travelmode
---------+---------+------------+-------------------------+------------+-------------+-------------+------------
5338113 | 533811 | 3 | 2016-04-01 00:47:16+01 | 1459467971 | 41.1523521 | -8.6097233 | 0
5338113 | 533811 | 3 | 2016-04-01 00:47:16+01 | 1459468020 | 41.1523518 | -8.6097168 | 0
5338113 | 533811 | 3 | 2016-04-01 00:47:16+01 | 1459468026 | 41.1524153 | -8.6097054 | 0
5338113 | 533811 | 3 | 2016-04-01 00:47:16+01 | 1459468031 | 41.1524057 | -8.609701 | 0
5338113 | 533811 | 3 | 2016-04-01 00:47:16+01 | 1459468036 | 41.1523647 | -8.6097146 | 0
5338113 | 533811 | 3 | 2016-04-01 00:47:16+01 | 1459468041 | 41.1525607 | -8.6096725 | 0
5338113 | 533811 | 3 | 2016-04-01 00:47:16+01 | 1459468046 | 41.1525077 | -8.6096843 | 0
5338113 | 533811 | 3 | 2016-04-01 00:47:16+01 | 1459468051 | 41.1524966 | -8.6096833 | 0
5338151 | 533815 | 1 | 2016-04-01 00:06:40+01 | 1459465282 | 41.14454009 | -8.56292593 | 3
5338151 | 533815 | 1 | 2016-04-01 00:06:40+01 | 1459465412 | 41.14454 | -8.5629259 | 3
5338151 | 533815 | 1 | 2016-04-01 00:06:40+01 | 1459465600 | 41.163172 | -8.5838214 | 3
这是一个包含超过 1 亿行的大型 table。我想用 smf_table
的过滤结果创建新的 table temp_table
,这样在新的 table:
- 不包括
travelmode
列IS NULL
的行(有很多) - 不包括
row2_timestamp - row1_timestamp = 0
. 的行
所以我想到了这样使用子查询:
CREATE TABLE temp_table
AS
WITH cte AS
(SELECT LEAD(timestamp) OVER (PARTITION BY id ORDER BY timestamp)
- LAG(timestamp) OVER (PARTITION BY id ORDER BY timestamp)
FROM smf_table
)
SELECT id,
lat,
lon,
timestamp,
travel mode
FROM smf_table
WHERE travelmode IS NOT NULL AND cte !=0;
ERROR: relation "smf_table" does not exist
LINE 13: FROM smf_table
你不应该收到 smf_table
未定义的错误。您可能会遇到其他错误——cte
未定义,CTE 中的列没有名称,travel
未定义。
您需要从 CTE select 才能使用其中的列。 CTE 类似于 tables/views,而不是列:
WITH cte AS (
SELECT s.*,
LEAD(timestamp) OVER (PARTITION BY id ORDER BY timestamp) - LAG(timestamp) OVER (PARTITION BY id ORDER BY timestamp) as diff
FROM smf_table s
)
SELECT id lat, lon, timestamp, travelmode
FROM cte
WHERE travelmode IS NOT NULL AND diff <> 0;
您必须 select 来自 table 的所需列并执行操作,而当 select 计算最终结果时,您必须 select 来自 cte 而不是原创table。您也可以直接创建 table,如下所示,无需递归查询。
CREATE TABLE temp_table as
SELECT SELECT LEAD(timestamp) OVER (PARTITION BY id ORDER BY timestamp)
- LAG(timestamp) OVER (PARTITION BY id ORDER BY timestamp) as date_time , id,
lat,
lon,
timestamp,
travel mode
FROM smf_table
WHERE travelmode IS NOT NULL AND cte !=0;
为什么要从 LEAD()
中减去 LAG()
?您的意思是忽略当前记录,将后面的记录与前面的记录进行比较吗?
另外,如果 timestamp1 - timestamp2 = 0
那么 timestamp1 = timestamp2
,所以这可以用 group by
来解决。
CREATE TABLE temp_table
AS
SELECT id,
max(lat) as lat,
max(lon) as lon,
timestamp,
max(travelmode) as travelmode
FROM smf_table
WHERE travelmode IS NOT NULL
GROUP by id, timestamp
HAVING count(*) = 1