在 postgresql 中清理字符串

Question

我在 table 中有一列，其中包含与公司变更相关的任何更新的数据，格式如下 -

#=============#==============#================#
| Company ID  |  updated_at  |   updates      |
#=============#==============#================#
| 101         | 2020-11-01   | name:          |
|             |              | -ABC           |
|             |              | -XYZ           |
|             |              | url:           |
|             |              | -www.abc.com   |
|             |              | -www.xyz.com   |
+-------------+--------------+----------------+
| 109         | 2020-10-20   | rating:        |
|             |              | -4.5           |
|             |              | -4.0           |
+-------------+--------------+----------------+

正如您在上面看到的，updates 列包含包含换行符并描述一个或多个更新的字符串。在上面的示例中，这意味着对于公司 ID 101，名称从 ABC 更改为 XYZ，并且 url 从 www.abc.com to www.xyz.com 更改。对于公司 ID 109，只有评级从 4.5 更改为 4.0。

但是我想将更新列分为 3 列 - 一列应该包含更改的内容（url、名称等），第二列应该包含旧值，第三列应该包含新值价值。像这样 -

#============#============#==============#================#
| Company ID |   Field    |  Old Value   |   New Value    |
#============#============#==============#================#
| 101        |   name     | ABC          | XYZ            |
+------------+------------+--------------+----------------+
| 101        |   url      | www.abc.com  | www.xyz.com    |
+------------+------------+--------------+----------------+
| 109        |   rating   | 4.5          | 4.0            |
+------------+------------+--------------+----------------+

我在 Postgres 中执行此操作并且知道如何根据字符提取子字符串，但这对我来说有点复杂，因为我需要从每一行的同一列中提取多个子字符串。任何帮助，将不胜感激。谢谢！

Answer 1

首先，您可以使用 regexp_split_into_table 和具有正向前瞻性的正则表达式来获得您的 table 版本，其中每一行仅包含一个更新：

select companyID, 
       updated_at, 
       regexp_split_to_table(updates, '\n(?=\y.+:)') as updates 
  from old;

这将在任何换行符 (\n) 处拆分列 updates，后跟一个单词和一个冒号 (\y.+:)。

#=============#==============#================#
| companyID   |  updated_at  |   updates      |
#=============#==============#================#
| 101         | 2020-11-01   | name:          |
|             |              | -ABC           |
|             |              | -XYZ           |
+-------------+--------------+----------------+
| 101         | 2020-11-01   | url:           |
|             |              | -www.abc.com   |
|             |              | -www.xyz.com   |
+-------------+--------------+----------------+
| 109         | 2020-10-20   | rating:        |
|             |              | -4.5           |
|             |              | -4.0           |
+-------------+--------------+----------------+

由此，您可以更轻松地构建您想要的 table。为此，您可以使用例如split_part 将更新字符串拆分为您想要的三个部分。

将此与第一部分放在一起可以得到完整的查询：

select companyID, 
       updated_at, 
       split_part(updates, E':', 1) as field, 
       split_part(updates, E'\n-', 2) as old_value, 
       split_part(updates, E'\n-', 3) as new_value  
  from (select companyID, 
               updated_at, 
               regexp_split_to_table(updates, '\n(?=\y.+:)') as updates 
          from old
       )
;

这里是db<>fiddle example.

更多详细信息/附加信息：

newline character in postgres strings
postgresql regex word boundaries
splitting strings into new columns

在 postgresql 中清理字符串

Clean string in postgresql

regex

postgresql

text

split

substring