PHP 用于查找模式并包裹在锚标记中的正则表达式
PHP Regex to find pattern and wrap in anchor tags
我有一个包含电影名称和发行年份的字符串。我希望能够检测到标题(年份)模式,如果匹配则将其包装在锚标记中。
包裹它很容易。但是如果我不知道电影的名字是什么,是否可以编写一个正则表达式来匹配这个模式?
示例:
$str = 'A random string with movie titles in it.
Movies like The Thing (1984) and other titles like Captain America Civil War (2016).
The movies could be anywhere in this string.
And some movies like 28 Days Later (2002) could start with a number.';
因此模式将始终为 Title
(以大写字母开头)并以 (Year)
结尾。
这是我目前得到的:
if(preg_match('/^\p{Lu}[\w%+\/-]+\([0-9]+\)/', $str)){
error_log('MATCH');
}
else{
error_log('NO MATCH');
}
这目前不起作用。据我了解,这是它应该做的:
^\p{Lu} //match a word beginning with an uppercase letter
[\w%+\/-] //with any number of characters following it
+\([0-9]+\) //ending with an integer
我哪里出错了?
下面的 regex 应该这样做:
(?-i)(?<=[a-z]\s)[A-Z\d].*?\(\d+\)
说明
(?-i)
区分大小写
(?<=[a-z]\s)
向后查找任何小写字母和 space
[A-Z\d]
匹配一个大写字母或数字
.*?
匹配任意字符
\(\d+\)
匹配任何数字,包括括号
PHP
<?php
$regex = '/(?-i)(?<=[a-z]\s)[A-Z\d].*?\(\d+\)/';
$str = 'A random string with movie titles in it.
Movies like The Thing (1984) and other titles like Captain America Civil War (2016).
The movies could be anywhere in this string.
And some movies like 28 Days Later (2002) could start with a number.';
preg_match_all($regex, $str, $matches);
print_r($matches);
?>
这个正则表达式可以完成工作:
~(?:[A-Z][a-zA-Z]+\s+|\d+\s+)+\(\d+\)~
解释:
~ : regex delimiter
(?: : start non capture group
[A-Z] : 1 capital letter, (use \p{Lu} if you want to match title in any language)
[a-zA-Z]+ : 1 or more letter, if you want to match title in any language(use \p{L})
\s+ : 1 or more spaces
| : OR
\d+ : 1 or more digits
\s+ : 1 or more spaces
)+ : end group, repeated 1 or more times
\(\d+\) : 1 or more digits surrounded by parenthesis, (use \d{4} if the year is always 4 digits)
~ : regex delimiter
实施:
$str = 'A random string with movie titles in it.
Movies like The Thing (1984) and other titles like Captain America Civil War (2016).
The movies could be anywhere in this string.
And some movies like 28 Days Later (2002) could start with a number.';
if (preg_match_all('~(?:[A-Z][a-zA-Z]+\s+|\d+\s+)+\(\d+\)~', $str, $match)) {
print_r($match);
error_log('MATCH');
}
else{
error_log('NO MATCH');
}
结果:
Array
(
[0] => Array
(
[0] => The Thing (1984)
[1] => Captain America Civil War (2016)
[2] => 28 Days Later (2002)
)
)
MATCH
我有一个包含电影名称和发行年份的字符串。我希望能够检测到标题(年份)模式,如果匹配则将其包装在锚标记中。
包裹它很容易。但是如果我不知道电影的名字是什么,是否可以编写一个正则表达式来匹配这个模式?
示例:
$str = 'A random string with movie titles in it.
Movies like The Thing (1984) and other titles like Captain America Civil War (2016).
The movies could be anywhere in this string.
And some movies like 28 Days Later (2002) could start with a number.';
因此模式将始终为 Title
(以大写字母开头)并以 (Year)
结尾。
这是我目前得到的:
if(preg_match('/^\p{Lu}[\w%+\/-]+\([0-9]+\)/', $str)){
error_log('MATCH');
}
else{
error_log('NO MATCH');
}
这目前不起作用。据我了解,这是它应该做的:
^\p{Lu} //match a word beginning with an uppercase letter
[\w%+\/-] //with any number of characters following it
+\([0-9]+\) //ending with an integer
我哪里出错了?
下面的 regex 应该这样做:
(?-i)(?<=[a-z]\s)[A-Z\d].*?\(\d+\)
说明
(?-i)
区分大小写(?<=[a-z]\s)
向后查找任何小写字母和 space[A-Z\d]
匹配一个大写字母或数字.*?
匹配任意字符\(\d+\)
匹配任何数字,包括括号
PHP
<?php
$regex = '/(?-i)(?<=[a-z]\s)[A-Z\d].*?\(\d+\)/';
$str = 'A random string with movie titles in it.
Movies like The Thing (1984) and other titles like Captain America Civil War (2016).
The movies could be anywhere in this string.
And some movies like 28 Days Later (2002) could start with a number.';
preg_match_all($regex, $str, $matches);
print_r($matches);
?>
这个正则表达式可以完成工作:
~(?:[A-Z][a-zA-Z]+\s+|\d+\s+)+\(\d+\)~
解释:
~ : regex delimiter
(?: : start non capture group
[A-Z] : 1 capital letter, (use \p{Lu} if you want to match title in any language)
[a-zA-Z]+ : 1 or more letter, if you want to match title in any language(use \p{L})
\s+ : 1 or more spaces
| : OR
\d+ : 1 or more digits
\s+ : 1 or more spaces
)+ : end group, repeated 1 or more times
\(\d+\) : 1 or more digits surrounded by parenthesis, (use \d{4} if the year is always 4 digits)
~ : regex delimiter
实施:
$str = 'A random string with movie titles in it.
Movies like The Thing (1984) and other titles like Captain America Civil War (2016).
The movies could be anywhere in this string.
And some movies like 28 Days Later (2002) could start with a number.';
if (preg_match_all('~(?:[A-Z][a-zA-Z]+\s+|\d+\s+)+\(\d+\)~', $str, $match)) {
print_r($match);
error_log('MATCH');
}
else{
error_log('NO MATCH');
}
结果:
Array
(
[0] => Array
(
[0] => The Thing (1984)
[1] => Captain America Civil War (2016)
[2] => 28 Days Later (2002)
)
)
MATCH