RegEx 忽略前缀字符集的单词
RegEx ignore word preciding a character set
我尝试将以下字符串与 RegEx 匹配:
286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)
339 in Cardboard Cutouts
2,945 in Jigsaws (Toys & Games)
这是我的 code/regex:
const matches = text.matchAll(/(?<!Top )([\d,|]+) in[\s\n ]([\w&'\s]+)/g);
for(const match of matches){
const rank = parseInt(match[1].replace(/[^\d]/g, ''));
const category = match[2].trim()
console.log(`${category} = ${rank}`)
}
但是,它唯一应该匹配的部分是:286,879 in Home & Kitchen
、339 in Cardboard Cutouts
、2,945 in Jigsaws (Toys & Games)
预期的输出应该是:
Home & Kitchen = 286879
Cardboard Cutouts = 339
Jigsaws = 2945
如何调整正则表达式以忽略 100 in Home & Kitchen
字符串
谢谢
如果你只想排除括号里的东西,你可以这样试试:
/^([\d,|]+) in[\s\n ]([\w&'\s]+)(\s*\(.*\)\s*)?$/gm
并忽略第三个捕获组
regex 群组:
result
- 来自输入的一条记录(行)
data
- 数字(包括,
)
cat
- 类别名称
extra
- 被忽略
JS
- 将
result
替换为 re-ordered cat
(</code>)、<code>=
和 data
(</code>)</li>
<li>将<code>,
替换为empty
const regex = /(?<result>(?<data>^[\d|,]+)(?: in )(?<cat>.+?)(?<extra>\s+(?:\(.+?\)?)?))$/gm;
// Alternative syntax using RegExp constructor
// const regex = new RegExp('(?<result>(?<data>^[\d|,]+)(?: in )(?<cat>.+?)(?<extra>\s+(?:\(.+?\)?)?))$', 'gm')
const str = `286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)
339 in Cardboard Cutouts
2,945 in Jigsaws (Toys & Games)`;
const subst = ` = `;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst).replace(',', '');
console.log('Substitution result: ', result);
您可能会使用 2 个捕获组:
(?<!Top\s+)\b(\d+(?:,\d+)?)\s+in\s+([^()\n]*[^\s()])
说明
(?<!Top\s+)
否定后视,断言不是 Top
紧接着当前位置左侧的 1+ 个空白字符。
\b
防止部分单词匹配的单词边界
(\d+(?:,\d+)?)
捕获 组 1,用可选的 ,
匹配 1+ 个数字和 1+ 个数字
\s+in\s+
在 1+ 个空白字符之间匹配 in
(
捕获 第 2 组
[^()\n]*[^\s()]
匹配换行符和 (
)
以外的可选字符
)
关闭 组 2
const regex = /(?<!Top\s+)\b(\d+(?:,\d+)?)\s+in\s+([^()\n]*[^\s()])/;
[
"const str = `286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)",
"339 in Cardboard Cutouts",
"2,945 in Jigsaws (Toys & Games)`;"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(`${m[2]} = ${m[1].replace(",", "")}`)
}
})
请注意,使用 \s
也可以匹配换行符。
我尝试将以下字符串与 RegEx 匹配:
286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)
339 in Cardboard Cutouts
2,945 in Jigsaws (Toys & Games)
这是我的 code/regex:
const matches = text.matchAll(/(?<!Top )([\d,|]+) in[\s\n ]([\w&'\s]+)/g);
for(const match of matches){
const rank = parseInt(match[1].replace(/[^\d]/g, ''));
const category = match[2].trim()
console.log(`${category} = ${rank}`)
}
但是,它唯一应该匹配的部分是:286,879 in Home & Kitchen
、339 in Cardboard Cutouts
、2,945 in Jigsaws (Toys & Games)
预期的输出应该是:
Home & Kitchen = 286879
Cardboard Cutouts = 339
Jigsaws = 2945
如何调整正则表达式以忽略 100 in Home & Kitchen
字符串
谢谢
如果你只想排除括号里的东西,你可以这样试试:
/^([\d,|]+) in[\s\n ]([\w&'\s]+)(\s*\(.*\)\s*)?$/gm
并忽略第三个捕获组
regex 群组:
result
- 来自输入的一条记录(行)data
- 数字(包括,
)cat
- 类别名称extra
- 被忽略
JS
- 将
result
替换为 re-orderedcat
(</code>)、<code>=
和data
(</code>)</li> <li>将<code>,
替换为empty
const regex = /(?<result>(?<data>^[\d|,]+)(?: in )(?<cat>.+?)(?<extra>\s+(?:\(.+?\)?)?))$/gm;
// Alternative syntax using RegExp constructor
// const regex = new RegExp('(?<result>(?<data>^[\d|,]+)(?: in )(?<cat>.+?)(?<extra>\s+(?:\(.+?\)?)?))$', 'gm')
const str = `286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)
339 in Cardboard Cutouts
2,945 in Jigsaws (Toys & Games)`;
const subst = ` = `;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst).replace(',', '');
console.log('Substitution result: ', result);
您可能会使用 2 个捕获组:
(?<!Top\s+)\b(\d+(?:,\d+)?)\s+in\s+([^()\n]*[^\s()])
说明
(?<!Top\s+)
否定后视,断言不是Top
紧接着当前位置左侧的 1+ 个空白字符。\b
防止部分单词匹配的单词边界(\d+(?:,\d+)?)
捕获 组 1,用可选的,
匹配 1+ 个数字和 1+ 个数字\s+in\s+
在 1+ 个空白字符之间匹配in
(
捕获 第 2 组[^()\n]*[^\s()]
匹配换行符和(
)
以外的可选字符
)
关闭 组 2
const regex = /(?<!Top\s+)\b(\d+(?:,\d+)?)\s+in\s+([^()\n]*[^\s()])/;
[
"const str = `286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)",
"339 in Cardboard Cutouts",
"2,945 in Jigsaws (Toys & Games)`;"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(`${m[2]} = ${m[1].replace(",", "")}`)
}
})
请注意,使用 \s
也可以匹配换行符。