SQL Return 列中的值已更改的记录
SQL Return the record where the value in a column have changed
我在 Hive table 中有数据,看起来像这样 -
VIN
Mode
event
Start
End
ABC123456789
Mode 1
Deauthorized
01/01/2010 00:00:00
05/05/2014 14:54:54
ABC123456789
Mode 1
Deauthorized
05/05/2014 14:54:54
05/13/2014 19:09:51
ABC123456789
Mode 1
Deauthorized
05/13/2014 19:09:51
11/13/2014 22:26:32
ABC123456789
Mode 1
Authorized
11/13/2014 22:26:32
11/13/2014 22:31:00
ABC123456789
Mode 1
Authorized
11/13/2014 22:31:00
11/14/2014 01:23:56
ABC123456789
Mode 2
Deauthorized
11/14/2014 01:23:56
11/18/2014 19:38:51
ABC123456789
Mode 2
Deauthorized
11/18/2014 19:38:51
11/18/2014 19:38:54
ABC123456789
Mode 2
Deauthorized
11/18/2014 19:38:54
11/18/2014 20:07:52
ABC123456789
Mode 2
Authorized
11/18/2014 20:07:52
12/17/2014 19:22:50
ABC123456789
Mode 2
Authorized
12/17/2014 19:22:50
02/25/2015 20:03:44
ABC123456789
Mode 2
Authorized
02/25/2015 20:03:44
02/25/2015 20:03:48
ABC123456789
Mode 3
Authorized
02/25/2015 20:03:48
02/25/2015 20:14:05
ABC123456789
Mode 3
Deauthorized
02/25/2015 20:14:05
02/25/2015 20:14:29
ABC123456789
Mode 3
Deauthorized
02/25/2015 20:14:29
02/25/2015 20:40:21
我想获得一个汇总数据,其中事件列中的值与之前的值发生了变化。数据点按开始时间戳的升序排列。我尝试了 window 功能,但它对我不起作用。结果看起来类似于我在下面 table 中显示的结果。您能为此提出任何优化的解决方案吗?
VIN
Mode
event
Start
End
ABC123456789
Mode 1
Deauthorized
01/01/2010 00:00:00
05/05/2014 14:54:54
ABC123456789
Mode 1
Authorized
11/13/2014 22:26:32
11/13/2014 22:31:00
ABC123456789
Mode 2
Deauthorized
11/14/2014 01:23:56
11/18/2014 19:38:51
ABC123456789
Mode 2
Authorized
11/18/2014 20:07:52
12/17/2014 19:22:50
ABC123456789
Mode 3
Deauthorized
02/25/2015 20:14:05
02/25/2015 20:14:29
您可以使用 lag()
:
select t.*
from (select t.*,
lag(event) over (partition by vin order by start) as prev_event
from t
) t
where prev_event is null or prev_event <> event;
这会查看时间和 vin
的变化。我不确定 mode
是否也相关。如果是这样,请将其添加到 partition by
.
我在 Hive table 中有数据,看起来像这样 -
VIN | Mode | event | Start | End |
---|---|---|---|---|
ABC123456789 | Mode 1 | Deauthorized | 01/01/2010 00:00:00 | 05/05/2014 14:54:54 |
ABC123456789 | Mode 1 | Deauthorized | 05/05/2014 14:54:54 | 05/13/2014 19:09:51 |
ABC123456789 | Mode 1 | Deauthorized | 05/13/2014 19:09:51 | 11/13/2014 22:26:32 |
ABC123456789 | Mode 1 | Authorized | 11/13/2014 22:26:32 | 11/13/2014 22:31:00 |
ABC123456789 | Mode 1 | Authorized | 11/13/2014 22:31:00 | 11/14/2014 01:23:56 |
ABC123456789 | Mode 2 | Deauthorized | 11/14/2014 01:23:56 | 11/18/2014 19:38:51 |
ABC123456789 | Mode 2 | Deauthorized | 11/18/2014 19:38:51 | 11/18/2014 19:38:54 |
ABC123456789 | Mode 2 | Deauthorized | 11/18/2014 19:38:54 | 11/18/2014 20:07:52 |
ABC123456789 | Mode 2 | Authorized | 11/18/2014 20:07:52 | 12/17/2014 19:22:50 |
ABC123456789 | Mode 2 | Authorized | 12/17/2014 19:22:50 | 02/25/2015 20:03:44 |
ABC123456789 | Mode 2 | Authorized | 02/25/2015 20:03:44 | 02/25/2015 20:03:48 |
ABC123456789 | Mode 3 | Authorized | 02/25/2015 20:03:48 | 02/25/2015 20:14:05 |
ABC123456789 | Mode 3 | Deauthorized | 02/25/2015 20:14:05 | 02/25/2015 20:14:29 |
ABC123456789 | Mode 3 | Deauthorized | 02/25/2015 20:14:29 | 02/25/2015 20:40:21 |
我想获得一个汇总数据,其中事件列中的值与之前的值发生了变化。数据点按开始时间戳的升序排列。我尝试了 window 功能,但它对我不起作用。结果看起来类似于我在下面 table 中显示的结果。您能为此提出任何优化的解决方案吗?
VIN | Mode | event | Start | End |
---|---|---|---|---|
ABC123456789 | Mode 1 | Deauthorized | 01/01/2010 00:00:00 | 05/05/2014 14:54:54 |
ABC123456789 | Mode 1 | Authorized | 11/13/2014 22:26:32 | 11/13/2014 22:31:00 |
ABC123456789 | Mode 2 | Deauthorized | 11/14/2014 01:23:56 | 11/18/2014 19:38:51 |
ABC123456789 | Mode 2 | Authorized | 11/18/2014 20:07:52 | 12/17/2014 19:22:50 |
ABC123456789 | Mode 3 | Deauthorized | 02/25/2015 20:14:05 | 02/25/2015 20:14:29 |
您可以使用 lag()
:
select t.*
from (select t.*,
lag(event) over (partition by vin order by start) as prev_event
from t
) t
where prev_event is null or prev_event <> event;
这会查看时间和 vin
的变化。我不确定 mode
是否也相关。如果是这样,请将其添加到 partition by
.