如何使用 PHP 获取 URL 的子域?
How to get the subdomain of a URL using PHP?
我有一些这样的 URL:
1. https://www.example.com/classname/method/arg // {nothing}
2. http://www.example.com/classname/method/arg // {nothing}
3. https://example.com/classname/method/arg // {nothing}
4. http://example.com/classname/method/arg // {nothing}
5. www.example.com/classname/method/arg // {nothing}
6. example.com/classname/method/arg // {nothing}
7. sub.example.com/classname/method/arg // sub
8. www.sub.example.com/classname/method/arg // sub
9. http://sub.example.com/classname/method/arg // sub
10. https://sub.example.com/classname/method/arg // sub
11. http://www.sub.example.com/classname/method/arg // sub
12. https://www.sub.example.com/classname/method/arg // sub
// $url ^ // What I want ^
现在,如您所见,我想获取这些 URL 的 sobdomain。怎么样?
我有两种方法,但其中 none 并不适用于所有网址:
首先: (这仅适用于 7
)
echo array_shift((explode(".",$url)));
其次: (好一点)
$parsedUrl = parse_url($url);
$host = explode('.', $parsedUrl['host']);
echo $host[0];
您使用 explode()
的方法是正确的,但您可能还应该使用 parse_url()
函数从 URL 中获取域:see here for docs。 TL;DR: 给它一个 URL 作为它唯一的参数,并取回 URL 的所有部分单独分解的数组。
也就是说,更大的问题是如何区分 subdomain.somesite.com 和 somesite.co.uk - 第一个显然有子域,但第二个没有。恐怕除了与顶级域列表进行比较之外,我没有其他明智的解决方案可以提供。
使用 parse_url.
$url = 'http://sub.example.com/classname/method/arg';
$parsedUrl = parse_url($url);
$host = explode('.', $parsedUrl['host']);
$subdomain = $host[0];
echo $subdomain;
对于多个子域你应该这样做
$url = 'http://en.sub.example.com/classname/method/arg';
$parsedUrl = parse_url($url);
$host = explode('.', $parsedUrl['host']);
$subdomains = array_slice($host, 0, count($host) - 2 );
print_r($subdomains);
我要把这个留在这里...
使用@TwoStraws 的想法,我创建了一个函数,它将使用 data.iana.org 的最新 TLD 提供给定 URL 的子域、基域和 TLD 域部分列表。
function GetDomainParts($URL,$TLDs_List = 'http://data.iana.org/TLD/tlds-alpha-by-domain.txt') {
// Get a list of all top level domains
$TLDs = explode(PHP_EOL,file_get_contents($TLDs_List));
unset($TLDs[0]); array_values($TLDs);
// And since that list has all the country codes too, lets assume all 2 letter domains are country codes, and get that list too
$CC_TLDs = [];
foreach($TLDs as $TLD) {
if(strlen($TLD) == 2) {
$CC_TLDs[] = $TLD;
}
}
// Now lets take our URL and remove some things
$ParsedUrl = parse_url($URL);
$Host = explode('.', $ParsedUrl['host']);
// If we cant find it, we return false...
$BaseDomain = false;
$TLDDomain = false;
// And look at the last 2 items in the Host array, these will be our TLD's (possibly)
$N_Minus_1 = strtoupper(isset($Host[count($Host)-1])?$Host[count($Host)-1]:null);
$N_Minus_2 = strtoupper(isset($Host[count($Host)-2])?$Host[count($Host)-2]:null);
// This has a potential of being our base domain, but may not be there
$N_Minus_3 = strtoupper(isset($Host[count($Host)-3])?$Host[count($Host)-3]:null);
// We first check our N Minus 1 against our list of Country Code TLDs
if(in_array($N_Minus_1,$CC_TLDs)) {
// If N Minus 1 is in the Country Code, We can check our N Minus 2 and see if it is in the TLDs array
if(in_array($N_Minus_2,$TLDs)) {
// If N Minus 2 is in the list of TLDs, we make the assumption that this is part of the TLD, making N Minus 3 our Base Domain
$BaseDomain = $N_Minus_3;
$TLDDomain = $N_Minus_2.'.'.$N_Minus_1;
// We unset the parts that are used, the rest is our sub domain
unset($Host[count($Host)-1]);
unset($Host[count($Host)-1]);
unset($Host[count($Host)-1]);
$SubDomain = implode('.',$Host);
} else {
// If N Minus 2 is NOT in the list of TLDs, we make the assumption that this is our Base Domain
$BaseDomain = $N_Minus_2;
$TLDDomain = $N_Minus_1;
// We unset the parts that are used, the rest is our sub domain
unset($Host[count($Host)-1]);
unset($Host[count($Host)-1]);
$SubDomain = implode('.',$Host);
}
} else {
// If N Minus 1 is NOT in the Country Codes, we can assume it is the TLD, lets check it against the TLDs to make sure
if(in_array($N_Minus_1,$TLDs)) {
// If N Minus 1 Is in our List of TLDs, we can assume we found our TLD, so N Minus 2 must be our Base Domain
$BaseDomain = $N_Minus_2;
$TLDDomain = $N_Minus_1;
// We unset the parts that are used, the rest is our sub domain
unset($Host[count($Host)-1]);
unset($Host[count($Host)-1]);
$SubDomain = implode('.',$Host);
} else {
// If N Minus 1 is NOT in our list of TLDs it is either a new TLD unheard of by iana.org, or does not exist, lets make the assumption that it is the tld
$BaseDomain = $N_Minus_2;
$TLDDomain = $N_Minus_1;
// We unset the parts that are used, the rest is our sub domain
unset($Host[count($Host)-1]);
unset($Host[count($Host)-1]);
$SubDomain = implode('.',$Host);
// Not sure if it is needed, but at this point we can swap the checks, checking minus 2 as the country code and minus 1 as the TLD,
// but I am not sure this is ever a real world scenerio, and am unable to find any proof to support this theory
}
}
// Return our URL Parts ( DISCLAIMER: Note that this will not solve every URL, such as WWW.AFAMILYCOMPANY.CO,
// because both AFAMILYCOMPANY and CO are TLDs one being a TLD and the other being a Country Code, Leaving "WWW" as the Base Domain.
// I use this functionality to auto-populate a user changeable setting, just in case my assumption is wrong the user can fix it.
// One should not assume this will work 100% of the time! )
return [strtolower($SubDomain),strtolower($BaseDomain),strtolower($TLDDomain)];
}
请阅读免责声明...
请注意,这不会解决每个 URL,例如 WWW.AFAMILYCOMPANY.CO,因为 AFAMILYCOMPANY 和 CO 都是 TLD,一个是 TLD,另一个是国家代码,留下 "WWW" 作为基域。我将此功能用于 auto-populate 用户可更改的设置,以防万一我的假设错误,用户可以修复它。人们不应该假设这会在 100% 的时间内起作用!
另外请注意,http://whois.domaintools.com/afamilycompany.co 被列为 "Restricted and Reserved Names" 域。如果互联网运行正常,那么这些类型的域无论如何都不应投入生产,因此此功能是安全的。
检查此功能是否确实适用于您的域的一种简单方法是转到 http://data.iana.org/TLD/tlds-alpha-by-domain.txt 按 Ctrl+F 并检查该域是否在列表中,如果在,这个函数就会失败,如果不是,这个函数将在 100% 的时间内工作。我意识到这只是朝着正确方向迈出的一步,所以如果有人可以补充这个想法,请告诉我。
我有一些这样的 URL:
1. https://www.example.com/classname/method/arg // {nothing}
2. http://www.example.com/classname/method/arg // {nothing}
3. https://example.com/classname/method/arg // {nothing}
4. http://example.com/classname/method/arg // {nothing}
5. www.example.com/classname/method/arg // {nothing}
6. example.com/classname/method/arg // {nothing}
7. sub.example.com/classname/method/arg // sub
8. www.sub.example.com/classname/method/arg // sub
9. http://sub.example.com/classname/method/arg // sub
10. https://sub.example.com/classname/method/arg // sub
11. http://www.sub.example.com/classname/method/arg // sub
12. https://www.sub.example.com/classname/method/arg // sub
// $url ^ // What I want ^
现在,如您所见,我想获取这些 URL 的 sobdomain。怎么样?
我有两种方法,但其中 none 并不适用于所有网址:
首先: (这仅适用于 7
)
echo array_shift((explode(".",$url)));
其次: (好一点)
$parsedUrl = parse_url($url);
$host = explode('.', $parsedUrl['host']);
echo $host[0];
您使用 explode()
的方法是正确的,但您可能还应该使用 parse_url()
函数从 URL 中获取域:see here for docs。 TL;DR: 给它一个 URL 作为它唯一的参数,并取回 URL 的所有部分单独分解的数组。
也就是说,更大的问题是如何区分 subdomain.somesite.com 和 somesite.co.uk - 第一个显然有子域,但第二个没有。恐怕除了与顶级域列表进行比较之外,我没有其他明智的解决方案可以提供。
使用 parse_url.
$url = 'http://sub.example.com/classname/method/arg';
$parsedUrl = parse_url($url);
$host = explode('.', $parsedUrl['host']);
$subdomain = $host[0];
echo $subdomain;
对于多个子域你应该这样做
$url = 'http://en.sub.example.com/classname/method/arg';
$parsedUrl = parse_url($url);
$host = explode('.', $parsedUrl['host']);
$subdomains = array_slice($host, 0, count($host) - 2 );
print_r($subdomains);
我要把这个留在这里...
使用@TwoStraws 的想法,我创建了一个函数,它将使用 data.iana.org 的最新 TLD 提供给定 URL 的子域、基域和 TLD 域部分列表。
function GetDomainParts($URL,$TLDs_List = 'http://data.iana.org/TLD/tlds-alpha-by-domain.txt') {
// Get a list of all top level domains
$TLDs = explode(PHP_EOL,file_get_contents($TLDs_List));
unset($TLDs[0]); array_values($TLDs);
// And since that list has all the country codes too, lets assume all 2 letter domains are country codes, and get that list too
$CC_TLDs = [];
foreach($TLDs as $TLD) {
if(strlen($TLD) == 2) {
$CC_TLDs[] = $TLD;
}
}
// Now lets take our URL and remove some things
$ParsedUrl = parse_url($URL);
$Host = explode('.', $ParsedUrl['host']);
// If we cant find it, we return false...
$BaseDomain = false;
$TLDDomain = false;
// And look at the last 2 items in the Host array, these will be our TLD's (possibly)
$N_Minus_1 = strtoupper(isset($Host[count($Host)-1])?$Host[count($Host)-1]:null);
$N_Minus_2 = strtoupper(isset($Host[count($Host)-2])?$Host[count($Host)-2]:null);
// This has a potential of being our base domain, but may not be there
$N_Minus_3 = strtoupper(isset($Host[count($Host)-3])?$Host[count($Host)-3]:null);
// We first check our N Minus 1 against our list of Country Code TLDs
if(in_array($N_Minus_1,$CC_TLDs)) {
// If N Minus 1 is in the Country Code, We can check our N Minus 2 and see if it is in the TLDs array
if(in_array($N_Minus_2,$TLDs)) {
// If N Minus 2 is in the list of TLDs, we make the assumption that this is part of the TLD, making N Minus 3 our Base Domain
$BaseDomain = $N_Minus_3;
$TLDDomain = $N_Minus_2.'.'.$N_Minus_1;
// We unset the parts that are used, the rest is our sub domain
unset($Host[count($Host)-1]);
unset($Host[count($Host)-1]);
unset($Host[count($Host)-1]);
$SubDomain = implode('.',$Host);
} else {
// If N Minus 2 is NOT in the list of TLDs, we make the assumption that this is our Base Domain
$BaseDomain = $N_Minus_2;
$TLDDomain = $N_Minus_1;
// We unset the parts that are used, the rest is our sub domain
unset($Host[count($Host)-1]);
unset($Host[count($Host)-1]);
$SubDomain = implode('.',$Host);
}
} else {
// If N Minus 1 is NOT in the Country Codes, we can assume it is the TLD, lets check it against the TLDs to make sure
if(in_array($N_Minus_1,$TLDs)) {
// If N Minus 1 Is in our List of TLDs, we can assume we found our TLD, so N Minus 2 must be our Base Domain
$BaseDomain = $N_Minus_2;
$TLDDomain = $N_Minus_1;
// We unset the parts that are used, the rest is our sub domain
unset($Host[count($Host)-1]);
unset($Host[count($Host)-1]);
$SubDomain = implode('.',$Host);
} else {
// If N Minus 1 is NOT in our list of TLDs it is either a new TLD unheard of by iana.org, or does not exist, lets make the assumption that it is the tld
$BaseDomain = $N_Minus_2;
$TLDDomain = $N_Minus_1;
// We unset the parts that are used, the rest is our sub domain
unset($Host[count($Host)-1]);
unset($Host[count($Host)-1]);
$SubDomain = implode('.',$Host);
// Not sure if it is needed, but at this point we can swap the checks, checking minus 2 as the country code and minus 1 as the TLD,
// but I am not sure this is ever a real world scenerio, and am unable to find any proof to support this theory
}
}
// Return our URL Parts ( DISCLAIMER: Note that this will not solve every URL, such as WWW.AFAMILYCOMPANY.CO,
// because both AFAMILYCOMPANY and CO are TLDs one being a TLD and the other being a Country Code, Leaving "WWW" as the Base Domain.
// I use this functionality to auto-populate a user changeable setting, just in case my assumption is wrong the user can fix it.
// One should not assume this will work 100% of the time! )
return [strtolower($SubDomain),strtolower($BaseDomain),strtolower($TLDDomain)];
}
请阅读免责声明...
请注意,这不会解决每个 URL,例如 WWW.AFAMILYCOMPANY.CO,因为 AFAMILYCOMPANY 和 CO 都是 TLD,一个是 TLD,另一个是国家代码,留下 "WWW" 作为基域。我将此功能用于 auto-populate 用户可更改的设置,以防万一我的假设错误,用户可以修复它。人们不应该假设这会在 100% 的时间内起作用!
另外请注意,http://whois.domaintools.com/afamilycompany.co 被列为 "Restricted and Reserved Names" 域。如果互联网运行正常,那么这些类型的域无论如何都不应投入生产,因此此功能是安全的。
检查此功能是否确实适用于您的域的一种简单方法是转到 http://data.iana.org/TLD/tlds-alpha-by-domain.txt 按 Ctrl+F 并检查该域是否在列表中,如果在,这个函数就会失败,如果不是,这个函数将在 100% 的时间内工作。我意识到这只是朝着正确方向迈出的一步,所以如果有人可以补充这个想法,请告诉我。