ORDER BY 在 Postgres 中使用变音符号

Question

我需要 select 来自 table 的数据并使用 ORDER BY 子句对它们进行排序。问题是该列包含带有捷克变音符号的文本数据。我不能使用 COLLATE，因为数据库是使用 lc_collate = en_US.UTF-8 创建的 postgres 集群的一部分，我无法承受因使用正确的 lc_collate.[= 重新创建集群而导致的停机时间17=]

示例数据：

CREATE TABLE test (
  id serial PRIMARY key,
  name text
);

INSERT INTO test (name) VALUES ('Žoo'), ('Zoo'), ('ŽOO'), ('ZOO'),
  ('ŽoA'), ('ŽóA'), ('ŽoÁ'), ('ŽóÁ');

理想输出：

SELECT * FROM test ORDER BY name COLLATE "cs_CZ.utf8";
id | name
----+------
  2 | Zoo
  4 | ZOO
  5 | ŽoA
  7 | ŽoÁ
  6 | ŽóA
  8 | ŽóÁ
  1 | Žoo
  3 | ŽOO
(8 rows)

Here 我找到了一种解决方案：

SELECT * FROM test ORDER BY name USING ~<~;
id | name
----+------
  4 | ZOO
  2 | Zoo
  3 | ŽOO
  5 | ŽoA
  1 | Žoo
  7 | ŽoÁ
  6 | ŽóA
  8 | ŽóÁ
(8 rows)

结果足够接近（对我的使用而言）- caroned 字母在非 caroned 字母之后。

我有点跑题的 Postgresql anabasis with ~<~ operator

edit: 变成了.

回到问题：除了用正确的语言环境重新创建 postgres 集群之外，还有其他解决方案来获得理想的顺序吗？

如果对 ~<~ 运算符有一些了解也会很好。

Answer 1

我不确定我是否理解这个问题，因为看起来您已经找到了解决方案。我唯一能建议的是你可以添加一个新字段 czechName 并使用正确的 collate

http://www.postgresql.org/docs/current/static/sql-altertable.html

ADD [ COLUMN ] column_name data_type [ COLLATE collation ] [ column_constraint [ ... ] ]

Answer 2

类似于 Juan Carlos Oropeza 的建议，您可以尝试更改列的排序规则：

ALTER TABLE test ALTER COLUMN "name" TYPE text COLLATE 'cs_CZ.utf8';

参考：http://www.postgresql.org/docs/current/static/sql-altertable.html

Answer 3

正如@Igor 在他的评论中指出的那样，无需重新创建具有不同 lc_collate 的 postgres 集群并处理导致的停机时间。

解决问题的具体步骤是：

add/uncomment /etc/locale.gen

cs_CZ.UTF-8 UTF-8

生成新语言环境：

# locale-gen
在 postgres 中定义新排序规则：

CREATE COLLATION "cs_CZ.utf8" ( locale = 'cs_CZ.UTF-8' );

ORDER BY 在 Postgres 中使用变音符号

ORDER BY with diacritic in Postgres

postgresql

sql-order-by

diacritics