使用 LEAD/LAG 从 select 查询创建 table

CREATE a table from a select query with LEAD/LAG

这是table我手边的那种:

 SELECT * FROM smf_table LIMIT 20;
   id    | trip_id | segment_id | segment_start_timestamp | timestamp  |     lat     |     lon     | travelmode 
---------+---------+------------+-------------------------+------------+-------------+-------------+------------
 5338113 |  533811 |          3 | 2016-04-01 00:47:16+01  | 1459467971 |  41.1523521 |  -8.6097233 |          0
 5338113 |  533811 |          3 | 2016-04-01 00:47:16+01  | 1459468020 |  41.1523518 |  -8.6097168 |          0
 5338113 |  533811 |          3 | 2016-04-01 00:47:16+01  | 1459468026 |  41.1524153 |  -8.6097054 |          0
 5338113 |  533811 |          3 | 2016-04-01 00:47:16+01  | 1459468031 |  41.1524057 |   -8.609701 |          0
 5338113 |  533811 |          3 | 2016-04-01 00:47:16+01  | 1459468036 |  41.1523647 |  -8.6097146 |          0
 5338113 |  533811 |          3 | 2016-04-01 00:47:16+01  | 1459468041 |  41.1525607 |  -8.6096725 |          0
 5338113 |  533811 |          3 | 2016-04-01 00:47:16+01  | 1459468046 |  41.1525077 |  -8.6096843 |          0
 5338113 |  533811 |          3 | 2016-04-01 00:47:16+01  | 1459468051 |  41.1524966 |  -8.6096833 |          0
 5338151 |  533815 |          1 | 2016-04-01 00:06:40+01  | 1459465282 | 41.14454009 | -8.56292593 |          3
 5338151 |  533815 |          1 | 2016-04-01 00:06:40+01  | 1459465412 |    41.14454 |  -8.5629259 |          3
 5338151 |  533815 |          1 | 2016-04-01 00:06:40+01  | 1459465600 |   41.163172 |  -8.5838214 |          3

这是一个包含超过 1 亿行的大型 table。我想用 smf_table 的过滤结果创建新的 table temp_table,这样在新的 table:

  1. 不包括 travelmodeIS NULL 的行(有很多)
  2. 不包括 row2_timestamp - row1_timestamp = 0.
  3. 的行

所以我想到了这样使用子查询:

CREATE TABLE temp_table
AS
WITH cte AS
(SELECT LEAD(timestamp) OVER (PARTITION BY id ORDER BY timestamp) 
  - LAG(timestamp) OVER (PARTITION BY id ORDER BY timestamp) 
FROM smf_table
) 
SELECT id,
  lat,
  lon,
  timestamp,
  travel mode
FROM smf_table
WHERE travelmode IS NOT NULL AND cte !=0;

ERROR:  relation "smf_table" does not exist
LINE 13: FROM smf_table

你不应该收到 smf_table 未定义的错误。您可能会遇到其他错误——cte 未定义,CTE 中的列没有名称,travel 未定义。

您需要从 CTE select 才能使用其中的列。 CTE 类似于 tables/views,而不是列:

WITH cte AS (
      SELECT s.*,
             LEAD(timestamp) OVER (PARTITION BY id ORDER BY timestamp) - LAG(timestamp) OVER (PARTITION BY id ORDER BY timestamp) as diff
      FROM smf_table s
     ) 
SELECT id lat, lon, timestamp, travelmode
FROM cte
WHERE travelmode IS NOT NULL AND diff <> 0;

您必须 select 来自 table 的所需列并执行操作,而当 select 计算最终结果时,您必须 select 来自 cte 而不是原创table。您也可以直接创建 table,如下所示,无需递归查询。

CREATE TABLE temp_table as
    SELECT SELECT LEAD(timestamp) OVER (PARTITION BY id ORDER BY timestamp) 
  - LAG(timestamp) OVER (PARTITION BY id ORDER BY timestamp) as date_time , id,
  lat,
  lon,
  timestamp,
  travel mode
FROM smf_table
WHERE travelmode IS NOT NULL AND cte !=0;

为什么要从 LEAD() 中减去 LAG()?您的意思是忽略当前记录,将后面的记录与前面的记录进行比较吗?

另外,如果 timestamp1 - timestamp2 = 0 那么 timestamp1 = timestamp2,所以这可以用 group by 来解决。

CREATE TABLE temp_table
AS
SELECT id,
       max(lat) as lat,
       max(lon) as lon,
       timestamp,
       max(travelmode) as travelmode
  FROM smf_table
 WHERE travelmode IS NOT NULL 
 GROUP by id, timestamp
HAVING count(*) = 1