正则表达式拆分并记住匹配(优雅)

RegEx to split and remember match (elegantly)

我正在尝试为 API 维基百科的响应编写解析器。它真的很混乱,我已经求助于旧的 RegEx 来清理大部分东西。然而,我坚持这一点。考虑一个字符串:

 var a ="[[December 1]]  A triangular [[Conjunction (astronomy)|conjunction]] formed by a new Moon, Venus and Jupiter is a [[Conjunction (astronomy)#2008|prominent sight]] in the evening sky. [[December 2]]";

我希望此字符串中的文本为:

 "December 1  A triangular conjunction formed by a new Moon, Venus and Jupiter is a prominent sight in the evening sky. December 2"

我无法处理方括号 [,]。我不能只删除它们,因为链接是这样表示的:

 [[Conjunction (astronomy)#2008|prominent sight]]

在这种情况下,我想删除部分 "Conjunction (astronomy)#2008",因为网页上显示的实际字符串是 "prominent sight"。在 JavaScript?

中的一个 str.replace(//gi,"") 查询中,是否有一种优雅的方式来处理这两种情况?

我这样做 a.replace(/\[\[.*\|/gi, ""); 并且它产生:

 "prominent sight]] in the evening sky. [[December 2]]"

很明显,我需要匹配连续的左括号和右括号 [[]] 作为模式,然后记住我想的匹配。我不确定该怎么做,但我希望这两种情况是清楚的:

  1. [[ normal word ]] -> 正常字
  2. [[ some definition blah |foo bar]] -> foo 栏

您可以使用 String#replace 和以下正则表达式的回调

/\[{2}([\w\s()#]+)(?:\|([\w\s]+))?\]{2}/

Regex Demo

正则表达式解释:

  1. \[{2}([\w\s()#]+):
    • \[{2}:匹配 [[
    • ([\w\s()#]+):匹配任何字母数字字符,_,空格,(),和#一次或多次并放它在第一个捕获的组中。
  2. (?:\|([\w\s]+))?\]{2}:
    • (?:: Non-capturing组
    • \|:匹配管道符号|
    • ([\w\s]+):匹配字母数字字符,_,空格一次或多次,放入第二个捕获组
    • \]{2}:匹配]]

正则表达式可视化更容易理解

演示:

var regex = /\[{2}([\w\s()#]+)(?:\|([\w\s]+))?\]{2}/g;
var str = "[[December 1]]  A triangular [[Conjunction (astronomy)|conjunction]] formed by a new Moon, Venus and Jupiter is a [[Conjunction (astronomy)#2008|prominent sight]] in the evening sky. [[December 2]]";

str = str.replace(regex, function(c, m1, m2) {
  return m2 ? m2 : m1;
});

document.body.innerHTML = '<pre>' + str + '</pre>';


This regex is not working for the string I mentioned in the question, but it is working for the string I put up in the comment

"A [[2008 Iwate-Miyagi Nairiku earthquake|6.9 magnitude earthquake]] in Iwate Prefecture, Japan, kills 12 and injures more than 400."

您可以使用以下正则表达式。

\[{2}([^|]*?)(?:\|(.*?))?\]{2}

\[\[(?:([^|]*)|[^|]*\|(.*?))\]\]

您可以使用它并替换为 。查看演示。

https://regex101.com/r/iJ7bT6/9