删除数据在 json 列中的 postgresql 数据库中的连续重复项

Removing consecutive duplicates in a postgresql database where data is in json column

所以我有一个名为 state_data 的 postgresql table,其中有两列:datetimestatestate 列是 jsonb 类型,指定给定日期时间的各种状态数据。这是 table:

的示例
datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:01:00 | {"temp":75.0,"location":1}
2018-10-31 08:02:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

随着时间的推移,这个 table 会变得非常大 - 特别是我增加采样频率 - 我真的只想存储连续行具有不同温度的数据。所以上面的 table 会减少到,

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

如果温度数据在其自己的列中,我知道如何执行此操作,但是是否有一种直接的方法来处理此操作并删除基于 json 列中的项目的所有连续重复项?

如果我想删除两个 json 项的重复项怎么办?例如,

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

使用window函数lag():

select datetime, state
from (
    select datetime, state, lag(state) over (order by datetime) as prev
    from state_data
    ) s
where state->>'temp' is distinct from prev->>'temp'

如果 table 有主键,您应该在删除命令中使用它。在缺少主键的情况下,您可以将 state 转换为 jsonb:

delete from state_data
where (datetime, state::jsonb) not in (
    select datetime, state::jsonb
    from (
        select datetime, state, lag(state) over (order by datetime) as prev
        from state_data
        ) s
    where state->>'temp' is distinct from prev->>'temp'
)