PHP: 从字符串中删除 http://, http://www, https://, https:// 并获取域名和 TLD
PHP: Remove http://, http://www, https://, https:// from String and get the Domain name and TLD
我想在 PHP
中创建一个函数来删除所有输入,例如
http://
https://
http://www.
https://www.
http://xyz.
来自给定的域名,如
example.com
和returns这样的数组:
'name' => 'example'
'tld' => 'com'
知道怎么做吗?
我认为您不需要删除 protocol、www 甚至 subdomain , 您只需要从 URL 中提取 name 和 tdl 即可。所以试试这个:
正则表达式解决方案:
<?php
$url = 'https://www.example.com#anchor';
$host = parse_url($url, PHP_URL_HOST); // www.example.com
preg_match('/(\w+)\.(\w+)$/', $host, $matches);
$array_result = array ( "name" => $matches[1],
"tld" => $matches[2] );
print_r($array_result);
Online Demo
没有正则表达式:
<?php
$url = 'https://www.example.com#anchor';
$host = parse_url($url, PHP_URL_HOST); // www.example.com
$host_names = explode(".", $host);
$array_result = array ( "name" => $host_names[count($host_names)-2],
"tld" => $host_names[count($host_names)-1] );
print_r($array_result);
Online Demo
/*
Output:
* Array
* (
* [name] => example
* [tld] => com
* )
*/
尝试以下正则表达式:
(?:^|\s)(?:https?:\/\/)?(?:\w+(?=\.).)?(?<name>.*).(?<tld>(?<=\.)\w+)
在 https://regex101.com/r/lI2lB4/2
查看演示
如果你输入的是
www.google.com
mail.yahoo.com.in
http://microsoft.com
http://www.google.com
http://mail.yahoo.co.uk
捕获的内容将是:
MATCH 1
name = `google`
tld = `com`
MATCH 2
name = `yahoo.com`
tld = `in`
MATCH 3
name = `microsoft`
tld = `com`
MATCH 4
name = `google`
tld = `com`
MATCH 5
name = `yahoo.co`
tld = `uk`
提取真实 TLD 的正确方法是使用运行 Public Suffix List, only in this way you can correctly extract domains with two-, third-level TLDs (co.uk, a.bg, b.bg, etc.). I recomend use TLD Extract.
的软件包
示例代码如下:
$extract = new LayerShifter\TLDExtract\Extract();
$result = $extract->parse('http://mail.yahoo.co.uk');
$result->getSubdomain(); // will return (string) 'mail'
$result->getHostname(); // will return (string) 'yahoo'
$result->getSuffix(); // will return (string) 'co.uk'
我想在 PHP
中创建一个函数来删除所有输入,例如
http://
https://
http://www.
https://www.
http://xyz.
来自给定的域名,如
example.com
和returns这样的数组:
'name' => 'example'
'tld' => 'com'
知道怎么做吗?
我认为您不需要删除 protocol、www 甚至 subdomain , 您只需要从 URL 中提取 name 和 tdl 即可。所以试试这个:
正则表达式解决方案:
<?php
$url = 'https://www.example.com#anchor';
$host = parse_url($url, PHP_URL_HOST); // www.example.com
preg_match('/(\w+)\.(\w+)$/', $host, $matches);
$array_result = array ( "name" => $matches[1],
"tld" => $matches[2] );
print_r($array_result);
Online Demo
没有正则表达式:
<?php
$url = 'https://www.example.com#anchor';
$host = parse_url($url, PHP_URL_HOST); // www.example.com
$host_names = explode(".", $host);
$array_result = array ( "name" => $host_names[count($host_names)-2],
"tld" => $host_names[count($host_names)-1] );
print_r($array_result);
Online Demo
/*
Output:
* Array
* (
* [name] => example
* [tld] => com
* )
*/
尝试以下正则表达式:
(?:^|\s)(?:https?:\/\/)?(?:\w+(?=\.).)?(?<name>.*).(?<tld>(?<=\.)\w+)
在 https://regex101.com/r/lI2lB4/2
查看演示如果你输入的是
www.google.com
mail.yahoo.com.in
http://microsoft.com
http://www.google.com
http://mail.yahoo.co.uk
捕获的内容将是:
MATCH 1
name = `google`
tld = `com`
MATCH 2
name = `yahoo.com`
tld = `in`
MATCH 3
name = `microsoft`
tld = `com`
MATCH 4
name = `google`
tld = `com`
MATCH 5
name = `yahoo.co`
tld = `uk`
提取真实 TLD 的正确方法是使用运行 Public Suffix List, only in this way you can correctly extract domains with two-, third-level TLDs (co.uk, a.bg, b.bg, etc.). I recomend use TLD Extract.
的软件包示例代码如下:
$extract = new LayerShifter\TLDExtract\Extract();
$result = $extract->parse('http://mail.yahoo.co.uk');
$result->getSubdomain(); // will return (string) 'mail'
$result->getHostname(); // will return (string) 'yahoo'
$result->getSuffix(); // will return (string) 'co.uk'