Autohotkey RegExReplace 跳过不匹配的模式

Question

如何在用正则表达式替换输入时跳过不匹配的行？

例如。下面是我的test.txt

的内容

elkay_iyer@yahoo.com
elkay_qwer@yahoo.com
elke engineering ltd.,@yahoo.com
elke0265@yahoo.com
elke@yahoo.com

下面是我的带有正则表达式代码的 Autohotkey 脚本

ReplaceEmailsRegEx := "i)([a-z0-9]+(\.*|\_*|\-*))+@([a-z][a-z0-9\-]+(\.|\-*\.))+[a-z]{2,6}"
RemoveDuplicateCharactersRegEx := "s)(.)(?=.*)"

Try{
FileRead, EmailFromTxtFile, test.txt
OtherThanEmails :=RegExReplace(EmailFromTxtFile,ReplaceEmailsRegEx)
Chars :=RegExReplace(OtherThanEmails,RemoveDuplicateCharactersRegEx)
Loop{
StringReplace, OtherThanEmails, OtherThanEmails, `r`n`r`n,`r`n, UseErrorLevel
If ErrorLevel = 0
Break
}
If (StrLen(OtherThanEmails)){
Msgbox The Characters found other than email:`n%OtherThanEmails%
}
}
catch e {
ErrorString:="what: " . e.what . "file: " . e.file . " line: " . e.line . " msg: " . e.message . " extra: " . e.extra
Msgbox An Exception was thrown`n%ErrorString%
}
Return

当它替换 test.txt 时抛出错误：

e.what contains 'RegExReplace', e.line is 10

当我删除 test.txt 中的第 3 封电子邮件时，它执行无误。那么如何更改我的正则表达式以跳过有问题的字符串？

Answer 1

您遇到的问题是 灾难性的回溯 由于开头的嵌套量词：([a-z0-9]+(\.*|\_*|\-*))+。在这里，由于 * 量词，.、_ 和 - 都是可选的，因此您的模式会减少到 ([a-z0-9]+)+.

我建议 "unrolling" 第一个子模式使其线性化：

i)[a-z0-9]+(?:(?:\.+|_+|-+)[a-z0-9]+)*@([a-z][-a-z0-9]+\.)+[a-z]{2,6}

或

i)[a-z0-9]+(?:([._-])*[a-z0-9]+)*@(?:[a-z][-a-z0-9]+\.)+[a-z]{2,6}

如果在 "words".[=26 之间不允许超过 1 个 . 或 _ 或 -，您甚至可以删除 * =]

另外，\-*不需要在(\.|\-*\.)中交替使用，因为连字符与前一个字符class相匹配，因此，这个子模式可以简化为\..

见regex demo

Autohotkey RegExReplace 跳过不匹配的模式

Autohotkey RegExReplace Skip unmatched pattern

regex

autohotkey