Recursive/subroutine 匹配 CSS 媒体查询的正则表达式

Question

我正在寻找可以可靠地匹配媒体查询及其内容的正则表达式（在 PHP PCRE 中），包括有点奇怪的情况，其中媒体查询正文为空。源文本可能是：

@media only screen {
    p {
        color:red;
    }
}
@media only screen and (max-width: 596px) {
    p {
        color:blue;
    }
    img {
        max-width: 200px;
    }
}
@media only screen {

}
img {
    display: block;
}
@media only screen and (max-width: 240px) {
    p {
        color:green;
    }
}
p {
    font-weight: normal;
}

我想捕获每个媒体查询及其 CSS 主体作为子模式，所以我最终会得到一个 PHP 数组，例如：

[['@media only screen {
        p {
            color:red;
        }
    }','p {
            color:red;
        }'],...]

关键是这需要是递归或子例程模式才能平衡大括号。空查询足以混淆 this question 中的模式，因为它无法区分 css 规则的结尾和空媒体查询的结尾：

/@media[^{]+\{([\s\S]+?\})\s*\}/

我一直在尝试并未能使用 this article 中的建议来制作 (b(?:m|(?1))*e) 形式的模式，其中 b 是结构的开始，m是可以出现在构造中间的，e是可以出现在末尾的，其中none可以匹配相同的东西。

所以，b应该是@media[^{]+\{，e应该是\}，m需要消耗CSS规则，也许([^{]+?\{[^}]*?\s*\})，给我：

/(@media[^{]+\{(?:([^{]+?\{[^}]*?\}\s*)*|(?1))*\})/s

但是，这不起作用，所以我有点迷路了。谁能推荐一个有效的模式？

我已经设置了正则表达式测试 here。

或者，非正则表达式解析器可能工作得更好。

请注意，我并不是要验证或匹配一般的 CSS 选择器（不是正则表达式的工作），只是获取查询的内容及其主体。

更新添加了更多示例内容，解释了我想要得到的内容。

Answer 1

如果您确定要匹配的块始终具有均衡数量的大括号，则可以使用带有子例程的正则表达式，如下所示：

'~@media\b[^{]*({((?:[^{}]+|(?1))*)})~'

见regex demo

这是一个 IDEONE demo:

$re = '~@media\b[^{]*({((?:[^{}]+|(?1))*)})~'; 
$str = "@media only screen {\n    p {\n        color:red;\n    }\n}\n@media only screen and (max-width: 596px) {\n    p {\n        color:blue;\n    }\n    img {\n        max-width: 200px;\n    }\n}\n@media only screen {\n\n}\nimg {\n    display: block;\n}\n@media only screen and (max-width: 240px) {\n    p {\n        color:green;\n    }\n}\np {\n    font-weight: normal;\n}"; 
preg_match_all($re, $str, $matches, PREG_PATTERN_ORDER);
print_r($matches[0]);
print_r($matches[2]);

图案详情:

@media\b - 匹配 @media 作为一个完整的单词（因为 \b 是一个单词边界）
[^{]* - 匹配 {
({((?:[^{}]+|(?1))*)}) - 捕获组 #1 捕获 {...} 块，其中 { 和 } 数量平衡（注意这是一个技术组，我们需要递归此组子模式以正确匹配 {...}s）。它匹配...
- { - 左大括号
- ((?:[^{}]+|(?1))*) - 第2组（平衡{...}里面的内容）匹配
  - [^{}]+ - 除了 { 和 } 之外的 1+ 个字符（因为我们需要匹配所有不是前导和尾随定界符的字符）
  - | - 或...
  - (?1) - 整个第 1 组子模式
- } - 右大括号

注意 $matches[2] 可以用 preg_match_all('~\s*(\w+)\s*{\s*([^}]*?)\s*}~', $matches[2], $subblocks) 模式进一步处理。

Recursive/subroutine 匹配 CSS 媒体查询的正则表达式

Recursive/subroutine regex to match CSS media queries

css

php

regex

recursion

pcre