Solr DataImportHandler RegexTranslator 用法:按模式跳过字段
Solr DataImportHandler RegexTranslator usage: skip field by pattern
我正在从 table_name 读取 column_name 的值到字段 fieldName 我的实体,我想排除某个值 negatory。我发现我可以用 SQL 中的条件表达式来做到这一点,例如:
select
CASE WHEN column_name = 'negatory'
THEN null
ELSE column_name
END AS fieldName
from table_name
但是,我无法使用 RegexTransformer
使我的原始实现正常工作(伪代码 data-config.xml):
<entity query="select column_name from table_name">
<field column="fieldName" sourceColName="column_name" regex="^(?!negatory)$" />
</entity>
我认为 SQL 解决方案很好(可能更好),但我很想知道 how/whether 它可以通过 RegexT运行sformer 完成。
更新(采纳了 stribizhev 的建议)
我想得越多,我就越不相信消极的前瞻能达到我想要的效果。我运行一个快速测试:
Pattern[] pp = new Pattern[] {
Pattern.compile("^(?!negatory)$"), // original attempt, only matches empty string
Pattern.compile("^((?!negatory).)*$"), // suggestion from stribizhev
Pattern.compile("^(?!.*negatory).*$"), // suggestion from stribizhev
Pattern.compile("^((?!negatory)|(.+(?=negatory)?.*)|(.*(?=negatory)?.+))$") // latest attempt
};
String[] ss = new String[] {
"contains substring negatory but should match",
"should match",
"negatory start should match",
"should match with trailing negatory",
"negatory"
};
int pi = 0;
for (Pattern p : pp) {
++pi;
int si = 0;
for (String s : ss) {
++si;
Matcher m = p.matcher(s);
if (m.find()) {
int count = m.groupCount();
System.out.println(String.format("%s groups for pattern/string: %s/%s", count, pi, si));
for (int i = 0; i <= count; ++i) {
System.out.println(String.format("\tgroup %s: %s", i, m.group(i)));
}
} else {
System.out.println(String.format("no match for pattern/string: %s/%s", pi, si));
}
}
}
对于以下结果:
no match for pattern/string: 1/1
no match for pattern/string: 1/2
no match for pattern/string: 1/3
no match for pattern/string: 1/4
no match for pattern/string: 1/5
no match for pattern/string: 2/1
1 groups for pattern/string: 2/2
group 0: should match
group 1: h
no match for pattern/string: 2/3
no match for pattern/string: 2/4
no match for pattern/string: 2/5
no match for pattern/string: 3/1
0 groups for pattern/string: 3/2
group 0: should match
no match for pattern/string: 3/3
no match for pattern/string: 3/4
no match for pattern/string: 3/5
3 groups for pattern/string: 4/1
group 0: contains substring negatory but should match
group 1: contains substring negatory but should match
group 2: contains substring negatory but should match
group 3: null
3 groups for pattern/string: 4/2
group 0: should match
group 1: should match
group 2: should match
group 3: null
3 groups for pattern/string: 4/3
group 0: negatory start should match
group 1: negatory start should match
group 2: negatory start should match
group 3: null
3 groups for pattern/string: 4/4
group 0: should match with trailing negatory
group 1: should match with trailing negatory
group 2: should match with trailing negatory
group 3: null
3 groups for pattern/string: 4/5
group 0: negatory
group 1: negatory
group 2: negatory
group 3: null
None 的模式按预期工作;目的是除了最后一个字符串之外的所有字符串都满足匹配器(忽略字符串本身说的任何内容)。看来我的意图与负前瞻的预期用途不符。
已解决(感谢 stribizhev)
解决方案:^(?!negatory$).*$
我在所有尝试中都缺少的东西是负前瞻组中的结束锚 $
。
您需要以下正则表达式:
^(?!negatory$).*$
见demo
正则表达式解释:
^
- 字符串开头
(?!negatory$)
- 负先行确保没有 negatory
就在字符串的开头和字符串的结尾
.*$
- 匹配除换行符以外的任意 0 个或多个字符,直至字符串末尾。
我正在从 table_name 读取 column_name 的值到字段 fieldName 我的实体,我想排除某个值 negatory。我发现我可以用 SQL 中的条件表达式来做到这一点,例如:
select
CASE WHEN column_name = 'negatory'
THEN null
ELSE column_name
END AS fieldName
from table_name
但是,我无法使用 RegexTransformer
使我的原始实现正常工作(伪代码 data-config.xml):
<entity query="select column_name from table_name">
<field column="fieldName" sourceColName="column_name" regex="^(?!negatory)$" />
</entity>
我认为 SQL 解决方案很好(可能更好),但我很想知道 how/whether 它可以通过 RegexT运行sformer 完成。
更新(采纳了 stribizhev 的建议)
我想得越多,我就越不相信消极的前瞻能达到我想要的效果。我运行一个快速测试:
Pattern[] pp = new Pattern[] {
Pattern.compile("^(?!negatory)$"), // original attempt, only matches empty string
Pattern.compile("^((?!negatory).)*$"), // suggestion from stribizhev
Pattern.compile("^(?!.*negatory).*$"), // suggestion from stribizhev
Pattern.compile("^((?!negatory)|(.+(?=negatory)?.*)|(.*(?=negatory)?.+))$") // latest attempt
};
String[] ss = new String[] {
"contains substring negatory but should match",
"should match",
"negatory start should match",
"should match with trailing negatory",
"negatory"
};
int pi = 0;
for (Pattern p : pp) {
++pi;
int si = 0;
for (String s : ss) {
++si;
Matcher m = p.matcher(s);
if (m.find()) {
int count = m.groupCount();
System.out.println(String.format("%s groups for pattern/string: %s/%s", count, pi, si));
for (int i = 0; i <= count; ++i) {
System.out.println(String.format("\tgroup %s: %s", i, m.group(i)));
}
} else {
System.out.println(String.format("no match for pattern/string: %s/%s", pi, si));
}
}
}
对于以下结果:
no match for pattern/string: 1/1
no match for pattern/string: 1/2
no match for pattern/string: 1/3
no match for pattern/string: 1/4
no match for pattern/string: 1/5
no match for pattern/string: 2/1
1 groups for pattern/string: 2/2
group 0: should match
group 1: h
no match for pattern/string: 2/3
no match for pattern/string: 2/4
no match for pattern/string: 2/5
no match for pattern/string: 3/1
0 groups for pattern/string: 3/2
group 0: should match
no match for pattern/string: 3/3
no match for pattern/string: 3/4
no match for pattern/string: 3/5
3 groups for pattern/string: 4/1
group 0: contains substring negatory but should match
group 1: contains substring negatory but should match
group 2: contains substring negatory but should match
group 3: null
3 groups for pattern/string: 4/2
group 0: should match
group 1: should match
group 2: should match
group 3: null
3 groups for pattern/string: 4/3
group 0: negatory start should match
group 1: negatory start should match
group 2: negatory start should match
group 3: null
3 groups for pattern/string: 4/4
group 0: should match with trailing negatory
group 1: should match with trailing negatory
group 2: should match with trailing negatory
group 3: null
3 groups for pattern/string: 4/5
group 0: negatory
group 1: negatory
group 2: negatory
group 3: null
None 的模式按预期工作;目的是除了最后一个字符串之外的所有字符串都满足匹配器(忽略字符串本身说的任何内容)。看来我的意图与负前瞻的预期用途不符。
已解决(感谢 stribizhev)
解决方案:^(?!negatory$).*$
我在所有尝试中都缺少的东西是负前瞻组中的结束锚 $
。
您需要以下正则表达式:
^(?!negatory$).*$
见demo
正则表达式解释:
^
- 字符串开头(?!negatory$)
- 负先行确保没有negatory
就在字符串的开头和字符串的结尾.*$
- 匹配除换行符以外的任意 0 个或多个字符,直至字符串末尾。