正则表达式（所有在第一场比赛之后（没有第一场比赛））

Question

我正在努力使用简单的 Regex 表达式。基本上我想要在没有“_”的“_”的第一场比赛之后的一切。

我现在的表情是这样的：_(.*)

当我输入时：AAA_BBB_CCC

输出为：_BBB_CCC

我理想的输出是：BBB_CCC

我正在使用带有内置正则表达式函数的雪花数据库。

很遗憾，我无法使用 (?<=_).*，因为它不支持这种格式的“?<=”。有没有其他方法可以修改 _(.*) 以获得正确的输出？

谢谢。

Answer 1

使用捕获组：

\_(?<data>.*)

哪个returns捕获组data包含BBB_CCC

示例： https://regex101.com/r/xZaXKR/1

Answer 2

您可以使用正则表达式来实现这一点，例如 JavaScript 就可以完成这项工作

"AAA_BBB_CCC".replace(/[^_]+./, '')

将REGEXP_REPLACE与雪花一起使用

regexp_replace('AAA_BBB_CCC','^[^_]+_','')

https://docs.snowflake.net/manuals/sql-reference/functions/regexp_replace.html

但你也可以找到_的第一个索引并使用子字符串，所有语言都可用

let text = "AAA_BBB_CCC"
let index = text.indexOf('_')
if(index !== -1 && index < text.length) {
    let result = text.substring(index+1)
}

Answer 3

在Snowflake SQL中，你可以使用REGEXP_SUBSTR，它的语法是

REGEXP_SUBSTR( <string> , <pattern> [ , <position> [ , <occurrence> [ , <regex_parameters> [ , <group_num ] ] ] ] ).

该函数允许您return 捕获子字符串:

By default, REGEXP_SUBSTR returns the entire matching part of the subject. However, if the e (for “extract”) parameter is specified, REGEXP_SUBSTR returns the the part of the subject that matches the first group in the pattern. If e is specified but a group_num is not also specified, then the group_num defaults to 1 (the first group). If there is no sub-expression in the pattern, REGEXP_SUBSTR behaves as if e was not set.

因此，您需要将 regex_parameters 设置为 e 并且 - 可选 - group_num 参数设置为 1：

Select REGEXP_SUBSTR('AAA_BBB_CCC', '_(.*)', 1, 1, 'e', 1)
Select REGEXP_SUBSTR('AAA_BBB_CCC', '_(.*)', 1, 1, 'e')

Answer 4

要使它真正起作用，您需要使用：

SELECT REGEXP_SUBSTR('AAA_BBB_CCC', '_(.*)', 1, 1, 'e', 1);

给出：

REGEXP_SUBSTR('AAA_BBB_CCC', '_(.*)', 1, 1, 'E', 1)
BBB_CCC

您需要传递 e 的 REGEXP_SUBSTR 参数 <regex_parameters> 子句，因为那是 extract sub-matches. 因此 Wiktor 的答案是 95% 正确。

正则表达式（所有在第一场比赛之后（没有第一场比赛））

Regex (All after first match (without the first match))

regex

snowflake-cloud-data-platform