如何为全文搜索配置 postgresql 标记化？

Question

这按预期工作：

# select to_tsvector('SICK FOTOCEL VS#VE180-P132') @@ 'p132'::tsquery;
 ?column? 
 ----------
 t

但是，当“#”被“/”替换时，我得到

# select to_tsvector('SICK FOTOCEL VS/VE180-P132') @@ 'p132'::tsquery;
 ?column? 
----------
 f

这是因为 VS/VE180-P132 被归类为文件标记。这在我们的用例中是不正确的。我该如何改变这种行为？例如，删除令牌类型 email、url 和 file?

Answer 1

除非您想用 C 编写新的解析器，否则您无法更改此行为。

但您可以使用以下解决方法：先替换所有字符串中的某些字符，然后再对它们进行全文搜索：

SELECT to_tsvector(regexp_replace('SICK FOTOCEL VS/VE180-P132', '[/.]', ' '))
       @@ to_tsquery(regexp_replace('p132', '[/.]', ' '));

How the configure postgresql tokenization for full text search?