Oracle REGEXP_REPLACE 并保留其中的一部分

Question

我在专栏中有一段文字，类似于

Hello World %UC#abc#UC%. How are you %UC#def#UC%. Have a nice day %UC#ghi#UC%.

我想使用 REGEXP_REPLACE（或任何）函数将 %UC#< value >#UC% 替换为 UNISTR(< value >)。从上面的例子来看，结果应该是

Hello World (UNISTR of abc). How are you (UNISTR of def). Have a nice day (UNISTR of ghi).

基本上它应该剥离 %UC# 并将其中的值替换为值的 UNISTR。

有什么方法可以实现吗？

Answer 1

这可能是 11g 及更高版本中的一种方式：

with test(s) as ( select 'Hello World %UC#abc#UC%. How are you %UC#def#UC%. Have a nice day %UC#ghi#UC%.' || '%UC##UC%' from dual)
select listagg (str) within group ( order by lev)
from (
        select regexp_substr(s, '(^|#UC%)(.*?)(%UC#)', 1, level, '', 2) || 
               UPPER(regexp_substr(s, '(%UC#)(.*?)(#UC%)', 1, level, '', 2)) as str,
               level as lev
        from test
        connect by instr(s, '%UC#', 1, level ) > 0
     )

这给出了（我使用 UPPER 而不是 UNISTR 来使结果清晰）：

Hello World ABC. How are you DEF. Have a nice day GHI.

这里的想法是使用常用的拆分字符串技术，将 '%UC#...#UC%' 包裹的部分视为分隔符；请注意，我在输入字符串中添加了一个小字符串 ('%UC##UC%') 来处理输入字符串的最后部分，使查询认为要处理的字符串以 and (empty) '%UC#...#UC%' 序列结束。

在 Oracle 10g 中，我们不能像我那样使用 listagg 和 regexp_substr，因此，解决方案有点复杂。

这里我完全不使用正则表达式，通过SYS_CONNECT_BY_PATH计算聚合；为此，我需要确定一个永远不会出现在您的输入文本中的字符串，比如 '@@':

with test as ( select 'Hello World %UC#abc#UC%. How are you %UC#def#UC%. Have a nice day %UC#ghi#UC%.' || '%UC##UC%' as s from dual)

with test as ( select 'Hello World %UC#abc#UC%. How are you %UC#def#UC%. Have a nice day %UC#ghi#UC%.' || '%UC##UC%' as s from dual)

select replace ( sys_connect_by_path (
                              substr(s, case when level = 1 then 1 else instr(s,'#UC%', 1, level-1) +4 end, instr(s, '%UC#', 1, level) -case when level = 1 then 1 else instr(s,'#UC%', 1, level-1) +4 end  ) || 
                              UPPER(substr(s, instr(s, '%UC#', 1, level) + 4, instr(s,'#UC%', 1, level) - (instr(s, '%UC#', 1, level) + 4)) )
                            , '@@'
                           ),
                 '@@') str                 
from test
where connect_by_isleaf = 1
connect by instr(s, '%UC#', 1, level ) > 0

Oracle REGEXP_REPLACE 并保留其中的一部分

Oracle REGEXP_REPLACE and retain part of it

regex

oracle

unicode

oracle10g