RegEx 忽略前缀字符集的单词

Question

我尝试将以下字符串与 RegEx 匹配：

286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)  
339 in Cardboard Cutouts    
2,945 in Jigsaws (Toys & Games)

这是我的 code/regex:

            const matches = text.matchAll(/(?<!Top )([\d,|]+) in[\s\n ]([\w&'\s]+)/g);
            for(const match of matches){
                const rank = parseInt(match[1].replace(/[^\d]/g, ''));
                const category = match[2].trim()
                console.log(`${category} = ${rank}`)
            }

但是，它唯一应该匹配的部分是：286,879 in Home & Kitchen、339 in Cardboard Cutouts、2,945 in Jigsaws (Toys & Games)

预期的输出应该是：

Home & Kitchen = 286879

Cardboard Cutouts = 339

Jigsaws = 2945

如何调整正则表达式以忽略 100 in Home & Kitchen 字符串

谢谢

Answer 1

如果你只想排除括号里的东西，你可以这样试试：

/^([\d,|]+) in[\s\n ]([\w&'\s]+)(\s*\(.*\)\s*)?$/gm

并忽略第三个捕获组

Answer 2

regex 群组：

result - 来自输入的一条记录（行）
data - 数字（包括,）
cat - 类别名称
extra - 被忽略

JS

将 result 替换为 re-ordered cat (</code>)、<code>= 和 data (</code>)</li> <li>将<code>,替换为empty

const regex = /(?<result>(?<data>^[\d|,]+)(?: in )(?<cat>.+?)(?<extra>\s+(?:\(.+?\)?)?))$/gm;

// Alternative syntax using RegExp constructor
// const regex = new RegExp('(?<result>(?<data>^[\d|,]+)(?: in )(?<cat>.+?)(?<extra>\s+(?:\(.+?\)?)?))$', 'gm')

const str = `286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)  
339 in Cardboard Cutouts    
2,945 in Jigsaws (Toys & Games)`;
const subst = ` = `;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst).replace(',', '');

console.log('Substitution result: ', result);

Answer 3

您可能会使用 2 个捕获组：

(?<!Top\s+)\b(\d+(?:,\d+)?)\s+in\s+([^()\n]*[^\s()])

说明

(?<!Top\s+) 否定后视，断言不是 Top 紧接着当前位置左侧的 1+ 个空白字符。
\b 防止部分单词匹配的单词边界
(\d+(?:,\d+)?) 捕获 组 1，用可选的 , 匹配 1+ 个数字和 1+ 个数字
\s+in\s+ 在 1+ 个空白字符之间匹配 in
( 捕获 第 2 组
- [^()\n]*[^\s()] 匹配换行符和 ( )
) 关闭 组 2

Regex demo

const regex = /(?<!Top\s+)\b(\d+(?:,\d+)?)\s+in\s+([^()\n]*[^\s()])/;

[
  "const str = `286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)",
  "339 in Cardboard Cutouts",
  "2,945 in Jigsaws (Toys & Games)`;"
].forEach(s => {
  const m = s.match(regex);
  if (m) {
    console.log(`${m[2]} = ${m[1].replace(",", "")}`)
  }
})

请注意，使用 \s 也可以匹配换行符。

RegEx 忽略前缀字符集的单词

RegEx ignore word preciding a character set

regex

node.js

regex 群组：

JS