从字符串中的链接中删除部分
Remove parts from links in a string
如何更改字符串中的所有链接:
...<p><a href="https://www.somesite.com/url?q=http://www.someothersite.se/&q1=xxx&q2=xxx">Some text</a>...
进入:
...<p><a href="http://www.someothersite.se/">Some text</a>...
"..." 表示还有很多其他代码。字符串中也有多个这样的链接。所有链接看起来像这样。
我想你可以使用类似的东西:
$new_url = preg_replace('%<a href=".*?\?q=(.*?)&.*?">(.*?)</a>%im', '<a href=""></a>', $old_url);
工作解决方案:
$regex = <<<EOF
%(<[aA]\s[^>]*href=['"])([^"']+url\?q=([A-z]+:\/{2}[^"'&]+)[^"']*)(["'][^>]*>)%im
EOF;
$replacement = '';
$html = <<<EOF
...<p><a href="https://www.somesite.com/url?q=http://www.secondsite.se/&q1=xxx&q2=xxx">Some text</a>...
...<p><a class="lnk" href="https://www.somesite.com/url?q=http://www.thirdsite.se" id="lnk">Some text</a>...
...<p><a class="lnk2" href="https://www.somesite.com/">Some text</a>...
EOF;
$new_html = preg_replace($regex, $replacement, $html);
正则表达式解释:
( - Group 1 (tag A from beginning to href parameter)
<[aA]\s - Match <a or <A followed by white character
[^>]* - Match anything after it except > because we want to match all parameters (like class, id etc.)
href=['"] - match href parameter with equal sign and ' or " after it
) - End group 1
( - Group 2 (content of href parameter)
[^"']+ - everything that is not ' or "
url\?q= - url?q=
( - Group 3 (URL we are really interested in)
[A-z]+:\/{2} - match protocol of the url http:// https:// ftp:// etc.
[^"'&]+ - match anything except ' " or &. those characters represents end of the url we are interested in.
) - End group 3
[^"']* - Anything except " or ' - this represents end of href parameter
) - End group 2
( - Group 4 - rest of the tag
["'] - " or ' closing href parameter
[^>]* - anything except > so we match rest of the tag
> - finally we match closing character >
) - End group 4
然后我们只用第 1、3 和 4 组替换整个内容。
如何更改字符串中的所有链接:
...<p><a href="https://www.somesite.com/url?q=http://www.someothersite.se/&q1=xxx&q2=xxx">Some text</a>...
进入:
...<p><a href="http://www.someothersite.se/">Some text</a>...
"..." 表示还有很多其他代码。字符串中也有多个这样的链接。所有链接看起来像这样。
我想你可以使用类似的东西:
$new_url = preg_replace('%<a href=".*?\?q=(.*?)&.*?">(.*?)</a>%im', '<a href=""></a>', $old_url);
工作解决方案:
$regex = <<<EOF
%(<[aA]\s[^>]*href=['"])([^"']+url\?q=([A-z]+:\/{2}[^"'&]+)[^"']*)(["'][^>]*>)%im
EOF;
$replacement = '';
$html = <<<EOF
...<p><a href="https://www.somesite.com/url?q=http://www.secondsite.se/&q1=xxx&q2=xxx">Some text</a>...
...<p><a class="lnk" href="https://www.somesite.com/url?q=http://www.thirdsite.se" id="lnk">Some text</a>...
...<p><a class="lnk2" href="https://www.somesite.com/">Some text</a>...
EOF;
$new_html = preg_replace($regex, $replacement, $html);
正则表达式解释:
( - Group 1 (tag A from beginning to href parameter)
<[aA]\s - Match <a or <A followed by white character
[^>]* - Match anything after it except > because we want to match all parameters (like class, id etc.)
href=['"] - match href parameter with equal sign and ' or " after it
) - End group 1
( - Group 2 (content of href parameter)
[^"']+ - everything that is not ' or "
url\?q= - url?q=
( - Group 3 (URL we are really interested in)
[A-z]+:\/{2} - match protocol of the url http:// https:// ftp:// etc.
[^"'&]+ - match anything except ' " or &. those characters represents end of the url we are interested in.
) - End group 3
[^"']* - Anything except " or ' - this represents end of href parameter
) - End group 2
( - Group 4 - rest of the tag
["'] - " or ' closing href parameter
[^>]* - anything except > so we match rest of the tag
> - finally we match closing character >
) - End group 4
然后我们只用第 1、3 和 4 组替换整个内容。