如何在使用正则表达式匹配整个句子后匹配一些单词？

Question

我是新手。我试图在下面的任何一行中找到全名，但没有 Obituary for

<h2>Obituary for John Doe</h2>
<h1>James Michael Lee</h1>

我的正则表达式是这样的。

(<h1>(.+?)<\/h1>|<h2>Obituary\sfor\s(.+?)<\/h2>)

我得到的仍然是Obituary for John Doe。如何删除 Obituary for?

Answer 1

你能不使用正则表达式来做这样的事情吗？

/**
 * @description : Function extracts names from html header tags
 * @example : "<h2>Obituary for John Doe</h2><h1>James Michael Lee</h1>" -> ["John Doe", "James Michael Lee"]
 * @param $html string
 * @return []string : list of full names
*/
function extractFullNames($html) {
    $regex = '/<h[1-2]>(.*?)<\/h[1-2]>/';
    preg_match_all($regex, $html, $matches);
    $names = $matches[1];
    $names = array_map('trim', $names);
    $names = array_map('strip_tags', $names);
    $names = array_map('strtolower', $names);
    $names = array_map('ucwords', $names);
    $names = array_map('removeObituary', $names); 
    return $names;
}

/**
 * @description : Function used to remove "Obituary For" if present
 * @example : "Obituary For John Doe" -> "John Doe"
 * @param $name string
 * @return string : name without "Obituary For"
*/
function removeObituary($name) {
    $name = str_replace("Obituary For ", "", $name);
    return $name;
} 

// Test cases
$html = '<h2>Obituary for John Doe</h2><h1>James Michael Lee</h1>';
$names = extractFullNames($html);
$expected = ['John Doe', 'James Michael Lee'];

echo "Expected: " . implode(', ', $expected) . "\n";
echo "Actual: " . implode(', ', $names);

Answer 2

条条大路通罗马，你大概可以这样做：

<h(?:1>|2>Obituary\sfor\s)\K[^><]+

See this demo at regex101。比赛将在 $out[0].

\K resets beginning of the reported match. See the SO Regex FAQ 更多。

Answer 3

我可能会做类似的事情

/^(?:\s<[^>]*?>)?(?:.*\s+for\s+)?([^<]*)/

并提取（第一个匹配组）。

Answer 4

使用

<h\d+>(?:Obituary\s+for\s+)?\K[^<>]+

参见regex proof。

如何在使用正则表达式匹配整个句子后匹配一些单词？

How to match some words after matching an entire sentence using regex?

php

regex