删除数据在 json 列中的 postgresql 数据库中的连续重复项
Removing consecutive duplicates in a postgresql database where data is in json column
所以我有一个名为 state_data
的 postgresql table,其中有两列:datetime
和 state
。 state
列是 jsonb 类型,指定给定日期时间的各种状态数据。这是 table:
的示例
datetime | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:01:00 | {"temp":75.0,"location":1}
2018-10-31 08:02:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}
随着时间的推移,这个 table 会变得非常大 - 特别是我增加采样频率 - 我真的只想存储连续行具有不同温度的数据。所以上面的 table 会减少到,
datetime | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}
如果温度数据在其自己的列中,我知道如何执行此操作,但是是否有一种直接的方法来处理此操作并删除基于 json 列中的项目的所有连续重复项?
如果我想删除两个 json 项的重复项怎么办?例如,
datetime | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}
使用window函数lag():
select datetime, state
from (
select datetime, state, lag(state) over (order by datetime) as prev
from state_data
) s
where state->>'temp' is distinct from prev->>'temp'
如果 table 有主键,您应该在删除命令中使用它。在缺少主键的情况下,您可以将 state
转换为 jsonb:
delete from state_data
where (datetime, state::jsonb) not in (
select datetime, state::jsonb
from (
select datetime, state, lag(state) over (order by datetime) as prev
from state_data
) s
where state->>'temp' is distinct from prev->>'temp'
)
所以我有一个名为 state_data
的 postgresql table,其中有两列:datetime
和 state
。 state
列是 jsonb 类型,指定给定日期时间的各种状态数据。这是 table:
datetime | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:01:00 | {"temp":75.0,"location":1}
2018-10-31 08:02:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}
随着时间的推移,这个 table 会变得非常大 - 特别是我增加采样频率 - 我真的只想存储连续行具有不同温度的数据。所以上面的 table 会减少到,
datetime | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}
如果温度数据在其自己的列中,我知道如何执行此操作,但是是否有一种直接的方法来处理此操作并删除基于 json 列中的项目的所有连续重复项?
如果我想删除两个 json 项的重复项怎么办?例如,
datetime | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}
使用window函数lag():
select datetime, state
from (
select datetime, state, lag(state) over (order by datetime) as prev
from state_data
) s
where state->>'temp' is distinct from prev->>'temp'
如果 table 有主键,您应该在删除命令中使用它。在缺少主键的情况下,您可以将 state
转换为 jsonb:
delete from state_data
where (datetime, state::jsonb) not in (
select datetime, state::jsonb
from (
select datetime, state, lag(state) over (order by datetime) as prev
from state_data
) s
where state->>'temp' is distinct from prev->>'temp'
)