如何在使用正则表达式匹配整个句子后匹配一些单词?
How to match some words after matching an entire sentence using regex?
我是新手。我试图在下面的任何一行中找到全名,但没有 Obituary for
<h2>Obituary for John Doe</h2>
<h1>James Michael Lee</h1>
我的正则表达式是这样的。
(<h1>(.+?)<\/h1>|<h2>Obituary\sfor\s(.+?)<\/h2>)
我得到的仍然是Obituary for John Doe
。如何删除 Obituary for
?
你能不使用正则表达式来做这样的事情吗?
/**
* @description : Function extracts names from html header tags
* @example : "<h2>Obituary for John Doe</h2><h1>James Michael Lee</h1>" -> ["John Doe", "James Michael Lee"]
* @param $html string
* @return []string : list of full names
*/
function extractFullNames($html) {
$regex = '/<h[1-2]>(.*?)<\/h[1-2]>/';
preg_match_all($regex, $html, $matches);
$names = $matches[1];
$names = array_map('trim', $names);
$names = array_map('strip_tags', $names);
$names = array_map('strtolower', $names);
$names = array_map('ucwords', $names);
$names = array_map('removeObituary', $names);
return $names;
}
/**
* @description : Function used to remove "Obituary For" if present
* @example : "Obituary For John Doe" -> "John Doe"
* @param $name string
* @return string : name without "Obituary For"
*/
function removeObituary($name) {
$name = str_replace("Obituary For ", "", $name);
return $name;
}
// Test cases
$html = '<h2>Obituary for John Doe</h2><h1>James Michael Lee</h1>';
$names = extractFullNames($html);
$expected = ['John Doe', 'James Michael Lee'];
echo "Expected: " . implode(', ', $expected) . "\n";
echo "Actual: " . implode(', ', $names);
条条大路通罗马,你大概可以这样做:
<h(?:1>|2>Obituary\sfor\s)\K[^><]+
See this demo at regex101。比赛将在 $out[0]
.
\K
resets beginning of the reported match. See the SO Regex FAQ 更多。
我可能会做类似的事情
/^(?:\s<[^>]*?>)?(?:.*\s+for\s+)?([^<]*)/
并提取
(第一个匹配组)。
使用
<h\d+>(?:Obituary\s+for\s+)?\K[^<>]+
参见regex proof。
我是新手。我试图在下面的任何一行中找到全名,但没有 Obituary for
<h2>Obituary for John Doe</h2>
<h1>James Michael Lee</h1>
我的正则表达式是这样的。
(<h1>(.+?)<\/h1>|<h2>Obituary\sfor\s(.+?)<\/h2>)
我得到的仍然是Obituary for John Doe
。如何删除 Obituary for
?
你能不使用正则表达式来做这样的事情吗?
/**
* @description : Function extracts names from html header tags
* @example : "<h2>Obituary for John Doe</h2><h1>James Michael Lee</h1>" -> ["John Doe", "James Michael Lee"]
* @param $html string
* @return []string : list of full names
*/
function extractFullNames($html) {
$regex = '/<h[1-2]>(.*?)<\/h[1-2]>/';
preg_match_all($regex, $html, $matches);
$names = $matches[1];
$names = array_map('trim', $names);
$names = array_map('strip_tags', $names);
$names = array_map('strtolower', $names);
$names = array_map('ucwords', $names);
$names = array_map('removeObituary', $names);
return $names;
}
/**
* @description : Function used to remove "Obituary For" if present
* @example : "Obituary For John Doe" -> "John Doe"
* @param $name string
* @return string : name without "Obituary For"
*/
function removeObituary($name) {
$name = str_replace("Obituary For ", "", $name);
return $name;
}
// Test cases
$html = '<h2>Obituary for John Doe</h2><h1>James Michael Lee</h1>';
$names = extractFullNames($html);
$expected = ['John Doe', 'James Michael Lee'];
echo "Expected: " . implode(', ', $expected) . "\n";
echo "Actual: " . implode(', ', $names);
条条大路通罗马,你大概可以这样做:
<h(?:1>|2>Obituary\sfor\s)\K[^><]+
See this demo at regex101。比赛将在 $out[0]
.
\K
resets beginning of the reported match. See the SO Regex FAQ 更多。
我可能会做类似的事情
/^(?:\s<[^>]*?>)?(?:.*\s+for\s+)?([^<]*)/
并提取(第一个匹配组)。
使用
<h\d+>(?:Obituary\s+for\s+)?\K[^<>]+
参见regex proof。