用于匹配某些 Table 的复杂 Where 子句的正则表达式

Regex to Match Complex Where Clause for Certian Table

我有一个程序采用受限制的 SQL 服务器 WHERE 子句并删除针对特定 table 的部分。这种 where 子句的一个例子是

AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND ([Episode].[YN] = 'Y')

我需要删除查询中使用 table Episode 的所有部分,并考虑 () 来括起语句和方括号用于字段名称等。为此,我有

private string BuildResourceWhereClauses(string whereClauses, string episodeTable)
{
    Regex r = new Regex(
        $"AND\s+\(*\[*{episodeTable}\]*\.\[*\w+\]*\s*(=|<>|<=|>=)(\s*\'*(NULL|\S+|\((.*?)\)+)\'*\s*\)*){{1}}",
        RegexOptions.IgnoreCase);

    string tmp = r.Replace(whereClauses, String.Empty).Trim();
    return $" {tmp}";
}

这很有效,returning

AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null)

但是现在,有人要求我扩展它,以便我们允许所有 SQL WHERE 子句语法。所以我们现在可以有一个像

这样的 where 子句

AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND ([Episode].[YN] = 'Y') AND (Episode.Paste = 'Y') AND [Episode].[Source] = '%6' AND [Episode].[TFC] NOT IN ('LWC', 'POD')

那我们要"parse",所以我把上面的方法修改为

private string BuildResourceWhereClauses(string whereClauses, string episodeTable)
{
    Regex r = new Regex(
        $"AND\s+\(*\[*{episodeTable}\]*\.\[*\w+\]*\s*(=|<>|<=|>=|LIKE|IN|NOT IN|IS|BETWEEN\s+\w+\s+AND)(\s*\'*(NULL|\S+|\((.*?)\)+)\'*\s*\)*){{1}}",
        RegexOptions.IgnoreCase);

    string tmp = r.Replace(whereClauses, String.Empty).Trim();
    return $" {tmp}";
}

使用 episodeTable = "Episode" 我得到 returned

AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) 'POD')

此缺失匹配 AND (Episode.Paste = 'Y')AND [Episode].[Source] = '%6'AND [Episode].[TFC] NOT IN ('LWC', 'POD')

  1. 正则表达式有什么问题我如何将其修改为我想要的return?

  2. 与其让这个正则表达式变得更复杂,不如简化一下?

感谢您的宝贵时间。


下面的答案删除了我以前的一些功能(我的错是没有规定我需要保留它!还有什么让它如此困难 - 捕获所有情况”)。所以我需要匹配这个字符串

AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND ([Episode].[YN] = 'Y') AND Episode.FRC BETWEEN 10 AND 20 AND Episode.Dt between '2011/02/25' and '2011/02/27' AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND ([Episode].[YN] = 'Y' AND Episode.TFC IS NOT LIKE '655r%') AND (Episode.Paste = 'Y') AND [Episode].[Source] IS NOT LIKE '%6' AND [Episode].[TFC] NOT IN ('LWC', 'POD') AND [Episode].[TFC] IS NULL

所以在 C# 中,我需要以下代码

string whereClaues = 
    "AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) " + 
    "AND ([Episode].[YN] = 'Y') AND Episode.FRC BETWEEN 10 AND 20 AND Episode.Dt between '2011/02/25' and '2011/02/27' " +
    "AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND ([Episode].[YN] = 'Y' AND Episode.TFC IS NOT LIKE '655r%') " +
    "AND (Episode.Paste = 'Y') AND [Episode].[Source] IS NOT LIKE '%6' AND [Episode].[TFC] NOT IN ('LWC', 'POD') AND [Episode].[TFC] IS NULL";
string tmp = r.Replace(whereClauses, String.Empty).Trim();

tmp 设为

AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null) AND (Util.Source='IP%' AND Util.ReqType = 'IP') AND (Util.Epinum is null)

删除所有 Episode 子句,包括 BETWEEN 语句以及 IS NOT NULLIS NULL 语句。

AND\s+\(*\[*Episode\]*\.\[*\w+\]*\s*(<>|[><]?=|(?:NOT\s+)?IN|(?:IS\s+)?LIKE|(?:IS\s+NOT\s+)?LIKE|BETWEEN(\s*\'*(\((.*?)\)+|NULL|\S+)\'*\s*\)*)AND)(\s*\'*(\((.*?)\)+|NULL|\S+)\'*\s*\)*)

但这不匹配

Episode.TFC IS NULL

看来您可以通过以下方式扩展您的模式:

$@"AND\s+\(*\[*{episodeTable}\]*\.\[*\w+\]*\s*(<>|[><]?=|(?:NOT\s+)?IN)(\s*\'*(\((.*?)\)+|NULL|\S+)\'*\s*\)*)"

参见regex demo here

详情

  • AND - 一个子串
  • \s+ - 1+ 个空格
  • \(* - 0+ ( 个字符
  • \[* - 0+ [ 个字符
  • Episode - table
  • 的名称
  • \]* - 0+ ] 个字符
  • \. - 一个 . 字符
  • \[* - 0+ [ 个字符
  • \w+ - 1+ 个单词字符
  • \]* - 0+ ] 个字符
  • \s* - 0+ 个空格
  • (<>|[><]?=|(?:NOT\s+)?IN) - 第 1 组:<><=>==NOT ININ
  • (\s*\'*(\((.*?)\)+|NULL|\S+)\'*\s*\)*) - 第 2 组:
    • \s* - 0+ 个空白字符
    • \'* - 0+ ' 个字符
    • (\((.*?)\)+|NULL|\S+) - 第 3 组:
      • \( - 一个(
      • (.*?) - 第 4 组:除换行符外的任何 0+ 个字符尽可能少
      • \)+ - 1+ ) 个字符
      • | - 或
      • NULL - NULL 子串
      • | - 或
      • \S+ - 1+ 个非空白字符
    • \'* - 0+ ' 个字符
    • \s* - 0+ 个空格
    • \)* - 0+ ) 个字符。