RegEx 忽略前缀字符集的单词

RegEx ignore word preciding a character set

我尝试将以下字符串与 RegEx 匹配:

286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)  
339 in Cardboard Cutouts    
2,945 in Jigsaws (Toys & Games)

这是我的 code/regex:

            const matches = text.matchAll(/(?<!Top )([\d,|]+) in[\s\n ]([\w&'\s]+)/g);
            for(const match of matches){
                const rank = parseInt(match[1].replace(/[^\d]/g, ''));
                const category = match[2].trim()
                console.log(`${category} = ${rank}`)
            }

但是,它唯一应该匹配的部分是:286,879 in Home & Kitchen339 in Cardboard Cutouts2,945 in Jigsaws (Toys & Games)

预期的输出应该是:

Home & Kitchen = 286879

Cardboard Cutouts = 339

Jigsaws = 2945

如何调整正则表达式以忽略 100 in Home & Kitchen 字符串

谢谢

如果你只想排除括号里的东西,你可以这样试试:

/^([\d,|]+) in[\s\n ]([\w&'\s]+)(\s*\(.*\)\s*)?$/gm

并忽略第三个捕获组

regex 群组:
  1. result - 来自输入的一条记录(行)
  2. data - 数字(包括,
  3. cat - 类别名称
  4. extra - 被忽略
JS
  • result 替换为 re-ordered cat (</code>)、<code>=data (</code>)</li> <li>将<code>,替换为empty
const regex = /(?<result>(?<data>^[\d|,]+)(?: in )(?<cat>.+?)(?<extra>\s+(?:\(.+?\)?)?))$/gm;

// Alternative syntax using RegExp constructor
// const regex = new RegExp('(?<result>(?<data>^[\d|,]+)(?: in )(?<cat>.+?)(?<extra>\s+(?:\(.+?\)?)?))$', 'gm')

const str = `286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)  
339 in Cardboard Cutouts    
2,945 in Jigsaws (Toys & Games)`;
const subst = ` = `;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst).replace(',', '');

console.log('Substitution result: ', result);

您可能会使用 2 个捕获组:

(?<!Top\s+)\b(\d+(?:,\d+)?)\s+in\s+([^()\n]*[^\s()])

说明

  • (?<!Top\s+) 否定后视,断言不是 Top 紧接着当前位置左侧的 1+ 个空白字符。
  • \b 防止部分单词匹配的单词边界
  • (\d+(?:,\d+)?) 捕获 组 1,用可选的 , 匹配 1+ 个数字和 1+ 个数字
  • \s+in\s+ 在 1+ 个空白字符之间匹配 in
  • ( 捕获 第 2 组
    • [^()\n]*[^\s()] 匹配换行符和 ( )
    • 以外的可选字符
  • ) 关闭 组 2

Regex demo

const regex = /(?<!Top\s+)\b(\d+(?:,\d+)?)\s+in\s+([^()\n]*[^\s()])/;

[
  "const str = `286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)",
  "339 in Cardboard Cutouts",
  "2,945 in Jigsaws (Toys & Games)`;"
].forEach(s => {
  const m = s.match(regex);
  if (m) {
    console.log(`${m[2]} = ${m[1].replace(",", "")}`)
  }
})

请注意,使用 \s 也可以匹配换行符。