删除数据在 json 列中的 postgresql 数据库中的连续重复项

Question

所以我有一个名为 state_data 的 postgresql table，其中有两列：datetime 和 state。 state 列是 jsonb 类型，指定给定日期时间的各种状态数据。这是 table:

的示例

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:01:00 | {"temp":75.0,"location":1}
2018-10-31 08:02:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

随着时间的推移，这个 table 会变得非常大 - 特别是我增加采样频率 - 我真的只想存储连续行具有不同温度的数据。所以上面的 table 会减少到，

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

如果温度数据在其自己的列中，我知道如何执行此操作，但是是否有一种直接的方法来处理此操作并删除基于 json 列中的项目的所有连续重复项？

如果我想删除两个 json 项的重复项怎么办？例如，

datetime            | state
================================================
2018-10-31 08:00:00 | {"temp":75.0,"location":1}
2018-10-31 08:03:00 | {"temp":75.0,"location":2}
2018-10-31 08:04:00 | {"temp":74.8,"location":1}
2018-10-31 08:05:00 | {"temp":74.8,"location":2}
2018-10-31 08:06:00 | {"temp":74.7,"location":1}

Answer 1

使用window函数lag():

select datetime, state
from (
    select datetime, state, lag(state) over (order by datetime) as prev
    from state_data
    ) s
where state->>'temp' is distinct from prev->>'temp'

如果 table 有主键，您应该在删除命令中使用它。在缺少主键的情况下，您可以将 state 转换为 jsonb：

delete from state_data
where (datetime, state::jsonb) not in (
    select datetime, state::jsonb
    from (
        select datetime, state, lag(state) over (order by datetime) as prev
        from state_data
        ) s
    where state->>'temp' is distinct from prev->>'temp'
)

删除数据在 json 列中的 postgresql 数据库中的连续重复项

Removing consecutive duplicates in a postgresql database where data is in json column

database

postgresql

jsonb