Preg_match 花括号标签内的字符串

Question

我想在标签之间抓取一个字符串。我的标签将带有花括号。

{myTag}Here is the string{/myTag}

到目前为止我已经找到 #<\s*?$tagname\b[^>]*>(.*?)</$tagname\b[^>]*>#s 这个匹配带尖括号的标签 <>。我不知道如何让它寻找花括号。

最后我想解析整个页面并获取所有匹配项并用字符串构建一个数组。

这是代码：

function everything_in_tags($string, $tagname)
{
    $pattern = "#<\s*?$tagname\b[^>]*>(.*?)</$tagname\b[^>]*>#s";
    preg_match($pattern, $string, $matches);
    return $matches[1];
}

$var = everything_in_tags($string, $tagname);

Answer 1

将所有 < 和 > 替换为 { 和 }，并将 preg_match() 更改为 preg_match_all()` 以捕获这些标签内的文本多次出现。

function everything_in_tags($string, $tagname)
{
    $pattern = "#{\s*?$tagname\b[^}]*}(.*?){/$tagname\b[^}]*}#s";
    preg_match_all($pattern, $string, $matches);
    return $matches[1];
}


$string = '{myTag}Here is the string{/myTag} and {myTag}here is more{/myTag}';
$tagname = 'myTag';
$var = everything_in_tags($string, $tagname);

忘记我提到的关于转义大括号的内容 - 我错了。

Answer 2

看起来你正在构建一个通用的辅助函数。因此，将任何具有特殊含义的字符转义到正则表达式引擎非常重要。要转义具有特殊含义的字符，请使用 preg_quote().

我们不知道您正在搜索的文本的质量，也不知道您标签名称的可变性。在某些情况下，使用 m（多字节）模式修饰符非常重要，这样才能正确读取 unicode 字符。 s 模式修饰符告诉正则表达式引擎模式中的“任何字符”点也应该匹配换行符。 “任何字符”点的默认行为是不匹配换行符。如果您需要容纳未知 upper/lower 大小写的标记名，请使用 i 模式修饰符。

如果您的卷曲标签内容的质量绝对确保不包含任何左花括号，那么您可以将 (.*?) 更改为 ([^{]*) 以允许正则表达式更有效地执行。

通过在开始标记中捕获和引用标记名，您可以稍微减少模式的步数并减少模式的总长度。

代码：(Demo)

$text = <<<TEXT
some text {myTag}Here is the string
on two lines{/myTag} some more text
TEXT;

function curlyTagContents(string $string, string $tagname): string
{
    $pattern = '/\{(' . preg_quote($tagname, '/') . ')}(.*?)\{\/}/s';
    return preg_match($pattern, $string, $matches) ? $matches[2] : '';
}

var_export(
    curlyTagContents($text, 'myTag')
);

输出：（单引号来自var_export()）

'Here is the string
on two lines'

Preg_match 花括号标签内的字符串

Preg_match string inside curly braces tags

php

regex

preg-match