PHP:preg_match_all 来自文本的 Youtube 视频 ID
PHP: preg_match_all Youtube video IDs from text
我想从 Youtube 文本中提取 url 字符串,如 https://www.youtube.com/watch?time_continue=218&v=0EB7zh_7UE4
和视频 ID,如 0EB7zh_7UE4
,这样我就可以根据视频 ID 在字符串后面插入文本。这是我的示例文本:
This is an example page will show up https://www.youtube.com/watch?time_continue=218&v=0EB7zh_7UE4 Bike https://www.youtube.com/watch?v=0EB7zh_7UE4&feature=youtu.be&app=desktop messenger by day, aspiring actor by night, and this is my website. I live in https://youtu.be/1EB7zh_7UE4 Los Angeles, have a great dog named Jack, and I https://www.youtube.com/watch?v=0EB7zh_7UE4&feature=youtu.be like piña coladasdoohickeys https://www.youtube.com/watch?v=4EB7zh_7UE4 you should go to <a href="http://example.com/wp-admin/">your dashboard</a> to delete this page and create new pages for your content. Have fun!
https://www.youtube.com/watch?v=0EB7zh_7UE4
more
https://www.youtube.com/watch?v=2EB7zh_7UE4&feature=youtu.be
That\'s all..
这是我目前得到的正则表达式,但错误如下:
它在 link 字符串的末尾(中间)之前添加 (here)
字符串。我
想在 Youtube 的最后添加 (here)
url link string
它returns多次here
注入
见代码:
function regex($sample_text) {
if (preg_match_all('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])(.*?)\b#s', $sample_text, $matches, PREG_SET_ORDER)) {
print_r($matches);
foreach ($matches as $match) {
$add = ' (here)';
$processed_text = str_replace($match[0], $match[0] . $add, $sample_text);
}
}
return $processed_text;
}
echo regex($sample_test);
我哪里弄错了?
注意:问题+示例文本已更新。
为了扩展我的评论,您每次都用原始字符串 $sample_text 替换结果文本。这是一个简单的修复,只需在开始时初始化 $processed_text,然后进行处理。
function regex($sample_text) {
$processed_text = $sample_text;
if (preg_match_all('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])(.*?)\b#s', $sample_text, $matches, PREG_SET_ORDER)) {
print_r($matches);
foreach ($matches as $match) {
$add = ' (here)';
$processed_text = str_replace($match[0], $match[0] . $add, $processed_text);
}
}
return $processed_text;
}
echo regex($sample_test);
您的正则表达式也不匹配 URL 的末尾。对于您提供的示例文本,您可以匹配任何非空格的内容:
'#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])\S*#s'
然而,这不会匹配 "
或 .
等字符,但您可以将它们作为 |
添加到组中。您似乎对正则表达式掌握得很好,所以我假设您可以解决这个问题 - 如果没有,请发表评论,我会更新我的答案。
为了完整起见,我将完整的代码包含在我的正则表达式中:
function regex($sample_text) {
$processed_text = $sample_text;
if (preg_match_all('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])\S*#s', $sample_text, $matches, PREG_SET_ORDER)) {
print_r($matches);
foreach ($matches as $match) {
$add = ' (here)';
$processed_text = str_replace($match[0], $match[0] . $add, $processed_text);
}
}
return $processed_text;
}
echo regex($sample_test);
<?php
$str = 'This is an example page will show up https://www.youtube.com/watch?time_continue=218&v=0EB7zh_7UE4 Bike https://www.youtube.com/watch?v=1EB7zh_7UE4&feature=youtu.be&app=desktop messenger by day, aspiring actor by night, and this is my website. I live in https://youtu.be/2EB7zh_7UE4 Los Angeles, have a great dog named Jack, and I https://www.youtube.com/watch?v=3EB7zh_7UE4&feature=youtu.be like piña coladasdoohickeys https://www.youtube.com/watch?v=4EB7zh_7UE4 you should go to <a href="http://example.com/wp-admin/">your dashboard</a> to delete this page and create new pages for your content. Have fun!
https://www.youtube.com/watch?v=5EB7zh_7UE4
more
https://www.youtube.com/watch?v=6EB7zh_7UE4&feature=youtu.be
That\'s all.';
preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $str, $match);
// youtube vid ID array placeholder
$youtubeVids = array();
// Going through each URL and retrieving the video ID
foreach($match[0] as $url)
{
// Parsing the URL
$url = parse_url($url);
// Retrieving the query if they exist
if(isset($url['query']))
{
parse_str($url['query'], $yt_vid);
}
// Checking if we have the query parts
if(isset($yt_vid['v']))
{
// Adding the vid ID to the vid ID list
$youtubeVids[] = $yt_vid['v'];
}
else
{
// No queries, checking if we are checking a youtube vid (maybe regex better?)
if(stripos($url['host'], 'youtu') !== false)
{
// Adding the ID to ID list (This is mainly for links like youtube.com/6EB7zh_7UE4 or youtu.be/6EB7zh_7UE4)
$youtubeVids[] = substr($url['path'], 1);
}
}
// Unsetting so it won't be set in the next loop
unset($yt_vid);
}
print_r($youtubeVids);
?>
产出
Array
(
[0] => 0EB7zh_7UE4
[1] => 1EB7zh_7UE4
[2] => 2EB7zh_7UE4
[3] => 3EB7zh_7UE4
[4] => 4EB7zh_7UE4
[5] => 5EB7zh_7UE4
[6] => 6EB7zh_7UE4
)
虽然我在网上找到了以下解决方案。
preg_match_all('/(?:youtube(?:-nocookie)?\.com\/(?:[^\/\n\s]+\/\S+\/|(?:v|e(?:mbed)?)\/|\S*?[?&]v=)|youtu\.be\/)([a-zA-Z0-9_-]{11})\W/', $str, $match);
print_r($match);
你可以使用
https?://\S+?\Qyoutube.com\E\S+?v=\K[^&\s]+
仅作记录,我最终得到了这个基于 this:
的 "simple" 函数
function filter($content) {
return preg_replace_callback('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])\S*#s', function($match) {
return sprintf('%s my replace with 2nd parameter found %s', $match[0], $match[1]);
}, $content);
}
这就是对我有用的东西:
function FindYouTubeId($url)
{
preg_match('%(?:youtube(?:-nocookie)?\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})%i', $url, $match);
$youtube_id = $match[1];
return $youtube_id;
}
我想从 Youtube 文本中提取 url 字符串,如 https://www.youtube.com/watch?time_continue=218&v=0EB7zh_7UE4
和视频 ID,如 0EB7zh_7UE4
,这样我就可以根据视频 ID 在字符串后面插入文本。这是我的示例文本:
This is an example page will show up https://www.youtube.com/watch?time_continue=218&v=0EB7zh_7UE4 Bike https://www.youtube.com/watch?v=0EB7zh_7UE4&feature=youtu.be&app=desktop messenger by day, aspiring actor by night, and this is my website. I live in https://youtu.be/1EB7zh_7UE4 Los Angeles, have a great dog named Jack, and I https://www.youtube.com/watch?v=0EB7zh_7UE4&feature=youtu.be like piña coladasdoohickeys https://www.youtube.com/watch?v=4EB7zh_7UE4 you should go to <a href="http://example.com/wp-admin/">your dashboard</a> to delete this page and create new pages for your content. Have fun!
https://www.youtube.com/watch?v=0EB7zh_7UE4
more
https://www.youtube.com/watch?v=2EB7zh_7UE4&feature=youtu.be
That\'s all..
这是我目前得到的正则表达式,但错误如下:
它在 link 字符串的末尾(中间)之前添加
(here)
字符串。我 想在 Youtube 的最后添加(here)
url link string它returns多次
here
注入
见代码:
function regex($sample_text) {
if (preg_match_all('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])(.*?)\b#s', $sample_text, $matches, PREG_SET_ORDER)) {
print_r($matches);
foreach ($matches as $match) {
$add = ' (here)';
$processed_text = str_replace($match[0], $match[0] . $add, $sample_text);
}
}
return $processed_text;
}
echo regex($sample_test);
我哪里弄错了?
注意:问题+示例文本已更新。
为了扩展我的评论,您每次都用原始字符串 $sample_text 替换结果文本。这是一个简单的修复,只需在开始时初始化 $processed_text,然后进行处理。
function regex($sample_text) {
$processed_text = $sample_text;
if (preg_match_all('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])(.*?)\b#s', $sample_text, $matches, PREG_SET_ORDER)) {
print_r($matches);
foreach ($matches as $match) {
$add = ' (here)';
$processed_text = str_replace($match[0], $match[0] . $add, $processed_text);
}
}
return $processed_text;
}
echo regex($sample_test);
您的正则表达式也不匹配 URL 的末尾。对于您提供的示例文本,您可以匹配任何非空格的内容:
'#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])\S*#s'
然而,这不会匹配 "
或 .
等字符,但您可以将它们作为 |
添加到组中。您似乎对正则表达式掌握得很好,所以我假设您可以解决这个问题 - 如果没有,请发表评论,我会更新我的答案。
为了完整起见,我将完整的代码包含在我的正则表达式中:
function regex($sample_text) {
$processed_text = $sample_text;
if (preg_match_all('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])\S*#s', $sample_text, $matches, PREG_SET_ORDER)) {
print_r($matches);
foreach ($matches as $match) {
$add = ' (here)';
$processed_text = str_replace($match[0], $match[0] . $add, $processed_text);
}
}
return $processed_text;
}
echo regex($sample_test);
<?php
$str = 'This is an example page will show up https://www.youtube.com/watch?time_continue=218&v=0EB7zh_7UE4 Bike https://www.youtube.com/watch?v=1EB7zh_7UE4&feature=youtu.be&app=desktop messenger by day, aspiring actor by night, and this is my website. I live in https://youtu.be/2EB7zh_7UE4 Los Angeles, have a great dog named Jack, and I https://www.youtube.com/watch?v=3EB7zh_7UE4&feature=youtu.be like piña coladasdoohickeys https://www.youtube.com/watch?v=4EB7zh_7UE4 you should go to <a href="http://example.com/wp-admin/">your dashboard</a> to delete this page and create new pages for your content. Have fun!
https://www.youtube.com/watch?v=5EB7zh_7UE4
more
https://www.youtube.com/watch?v=6EB7zh_7UE4&feature=youtu.be
That\'s all.';
preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $str, $match);
// youtube vid ID array placeholder
$youtubeVids = array();
// Going through each URL and retrieving the video ID
foreach($match[0] as $url)
{
// Parsing the URL
$url = parse_url($url);
// Retrieving the query if they exist
if(isset($url['query']))
{
parse_str($url['query'], $yt_vid);
}
// Checking if we have the query parts
if(isset($yt_vid['v']))
{
// Adding the vid ID to the vid ID list
$youtubeVids[] = $yt_vid['v'];
}
else
{
// No queries, checking if we are checking a youtube vid (maybe regex better?)
if(stripos($url['host'], 'youtu') !== false)
{
// Adding the ID to ID list (This is mainly for links like youtube.com/6EB7zh_7UE4 or youtu.be/6EB7zh_7UE4)
$youtubeVids[] = substr($url['path'], 1);
}
}
// Unsetting so it won't be set in the next loop
unset($yt_vid);
}
print_r($youtubeVids);
?>
产出
Array
(
[0] => 0EB7zh_7UE4
[1] => 1EB7zh_7UE4
[2] => 2EB7zh_7UE4
[3] => 3EB7zh_7UE4
[4] => 4EB7zh_7UE4
[5] => 5EB7zh_7UE4
[6] => 6EB7zh_7UE4
)
虽然我在网上找到了以下解决方案。
preg_match_all('/(?:youtube(?:-nocookie)?\.com\/(?:[^\/\n\s]+\/\S+\/|(?:v|e(?:mbed)?)\/|\S*?[?&]v=)|youtu\.be\/)([a-zA-Z0-9_-]{11})\W/', $str, $match);
print_r($match);
你可以使用
https?://\S+?\Qyoutube.com\E\S+?v=\K[^&\s]+
仅作记录,我最终得到了这个基于 this:
的 "simple" 函数function filter($content) {
return preg_replace_callback('#(?:https?:\/\/)?(?:m\.|www\.)?(?:youtu\.be\/|youtube\-nocookie\.com\/embed\/|youtube\.com\/(?:embed\/|v\/|e\/|\?v=|shared\?ci=|watch\?v=|watch\?.+&v=))([-_A-Za-z0-9]{10}[AEIMQUYcgkosw048])\S*#s', function($match) {
return sprintf('%s my replace with 2nd parameter found %s', $match[0], $match[1]);
}, $content);
}
这就是对我有用的东西:
function FindYouTubeId($url)
{
preg_match('%(?:youtube(?:-nocookie)?\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})%i', $url, $match);
$youtube_id = $match[1];
return $youtube_id;
}