Solr DataImportHandler RegexTranslator 用法:按模式跳过字段

Solr DataImportHandler RegexTranslator usage: skip field by pattern

我正在从 table_name 读取 column_name 的值到字段 fieldName 我的实体,我想排除某个值 negatory。我发现我可以用 SQL 中的条件表达式来做到这一点,例如:

select 
   CASE WHEN column_name = 'negatory' 
      THEN null 
      ELSE column_name 
   END AS fieldName 
from table_name

但是,我无法使用 RegexTransformer 使我的原始实现正常工作(伪代码 data-config.xml):

<entity query="select column_name from table_name">
    <field column="fieldName" sourceColName="column_name" regex="^(?!negatory)$" />
</entity>

我认为 SQL 解决方案很好(可能更好),但我很想知道 how/whether 它可以通过 RegexT运行sformer 完成。

更新(采纳了 stribizhev 的建议)

我想得越多,我就越不相信消极的前瞻能达到我想要的效果。我运行一个快速测试:

    Pattern[] pp = new Pattern[] {
        Pattern.compile("^(?!negatory)$"), // original attempt, only matches empty string
        Pattern.compile("^((?!negatory).)*$"), // suggestion from stribizhev
        Pattern.compile("^(?!.*negatory).*$"), // suggestion from stribizhev
        Pattern.compile("^((?!negatory)|(.+(?=negatory)?.*)|(.*(?=negatory)?.+))$") // latest attempt
    };
    String[] ss = new String[] {
        "contains substring negatory but should match",
        "should match",
        "negatory start should match",
        "should match with trailing negatory",
        "negatory"
    };
    int pi = 0;
    for (Pattern p : pp) {
        ++pi;
        int si = 0;
        for (String s : ss) {
            ++si;
            Matcher m = p.matcher(s);
            if (m.find()) {
                int count = m.groupCount();
                System.out.println(String.format("%s groups for pattern/string: %s/%s", count, pi, si));
                for (int i = 0; i <= count; ++i) {
                    System.out.println(String.format("\tgroup %s: %s", i, m.group(i)));
                }
            } else {
                System.out.println(String.format("no match for pattern/string: %s/%s", pi, si));
            }
        }
    }

对于以下结果:

no match for pattern/string: 1/1
no match for pattern/string: 1/2
no match for pattern/string: 1/3
no match for pattern/string: 1/4
no match for pattern/string: 1/5
no match for pattern/string: 2/1
1 groups for pattern/string: 2/2
    group 0: should match
    group 1: h
no match for pattern/string: 2/3
no match for pattern/string: 2/4
no match for pattern/string: 2/5
no match for pattern/string: 3/1
0 groups for pattern/string: 3/2
    group 0: should match
no match for pattern/string: 3/3
no match for pattern/string: 3/4
no match for pattern/string: 3/5
3 groups for pattern/string: 4/1
    group 0: contains substring negatory but should match
    group 1: contains substring negatory but should match
    group 2: contains substring negatory but should match
    group 3: null
3 groups for pattern/string: 4/2
    group 0: should match
    group 1: should match
    group 2: should match
    group 3: null
3 groups for pattern/string: 4/3
    group 0: negatory start should match
    group 1: negatory start should match
    group 2: negatory start should match
    group 3: null
3 groups for pattern/string: 4/4
    group 0: should match with trailing negatory
    group 1: should match with trailing negatory
    group 2: should match with trailing negatory
    group 3: null
3 groups for pattern/string: 4/5
    group 0: negatory
    group 1: negatory
    group 2: negatory
    group 3: null

None 的模式按预期工作;目的是除了最后一个字符串之外的所有字符串都满足匹配器(忽略字符串本身说的任何内容)。看来我的意图与负前瞻的预期用途不符。

已解决(感谢 stribizhev)

解决方案:^(?!negatory$).*$

我在所有尝试中都缺少的东西是负前瞻组中的结束锚 $

您需要以下正则表达式:

^(?!negatory$).*$

demo

正则表达式解释:

  • ^ - 字符串开头
  • (?!negatory$) - 负先行确保没有 negatory 就在字符串的开头和字符串的结尾
  • .*$ - 匹配除换行符以外的任意 0 个或多个字符,直至字符串末尾。