English 中文(简体)
How to validate domain name in PHP?
原标题:

Is it possible without using regular expression?

For example, I want to check that a string is a valid domain:

domain-name
abcd
example

Are valid domains. These are invalid of course:

domaia@name
ab$%cd

And so on. So basically it should start with an alphanumeric character, then there may be more alnum characters plus also a hyphen. And it must end with an alnum character, too.

If it s not possible, could you suggest me a regexp pattern to do this?

EDIT:

Why doesn t this work? Am I using preg_match incorrectly?

$domain =  @djkal ;
$regexp =  /^[a-zA-Z0-9][a-zA-Z0-9-\_]+[a-zA-Z0-9]$/ ;
if (false === preg_match($regexp, $domain)) {
    throw new Exception( Domain invalid );
}
问题回答
<?php
function is_valid_domain_name($domain_name)
{
    return (preg_match("/^([a-zd](-*[a-zd])*)(.([a-zd](-*[a-zd])*))*$/i", $domain_name) //valid chars check
            && preg_match("/^.{1,253}$/", $domain_name) //overall length check
            && preg_match("/^[^.]{1,63}(.[^.]{1,63})*$/", $domain_name)   ); //length of each label
}
?>

Test cases:

is_valid_domain_name? [a]                       Y
is_valid_domain_name? [0]                       Y
is_valid_domain_name? [a.b]                     Y
is_valid_domain_name? [localhost]               Y
is_valid_domain_name? [google.com]              Y
is_valid_domain_name? [news.google.co.uk]       Y
is_valid_domain_name? [xn--fsqu00a.xn--0zwm56d] Y
is_valid_domain_name? [goo gle.com]             N
is_valid_domain_name? [google..com]             N
is_valid_domain_name? [google.com ]             N
is_valid_domain_name? [google-.com]             N
is_valid_domain_name? [.google.com]             N
is_valid_domain_name? [<script]                 N
is_valid_domain_name? [alert(]                  N
is_valid_domain_name? [.]                       N
is_valid_domain_name? [..]                      N
is_valid_domain_name? [ ]                       N
is_valid_domain_name? [-]                       N
is_valid_domain_name? []                        N

With this you will not only be checking if the domain has a valid format, but also if it is active / has an IP address assigned to it.

$domain = "stackoverflow.com";

if(filter_var(gethostbyname($domain), FILTER_VALIDATE_IP))
{
    return TRUE;
}

Note that this method requires the DNS entries to be active so if you require a domain string to be validated without being in the DNS use the regular expression method given by velcrow above.

Also this function is not intended to validate a URL string use FILTER_VALIDATE_URL for that. We do not use FILTER_VALIDATE_URL for a domain because a domain string is not a valid URL.

PHP 7

// Validate a domain name
var_dump(filter_var( mandrill._domainkey.mailchimp.com , FILTER_VALIDATE_DOMAIN));
# string(33) "mandrill._domainkey.mailchimp.com"

// Validate an hostname (here, the underscore is invalid)
var_dump(filter_var( mandrill._domainkey.mailchimp.com , FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME));
# bool(false)

It is not documented here: http://www.php.net/filter.filters.validate and a bug request for this is located here: https://bugs.php.net/bug.php?id=72013

use checkdnsrr http://php.net/manual/en/function.checkdnsrr.php

$domain = "stackoverflow.com";

checkdnsrr($domain , "A");

//returns true if has a dns A record, false otherwise

Firstly, you should clarify whether you mean:

  1. individual domain name labels
  2. entire domain names (i.e. multiple dot-separate labels)
  3. host names

The reason the distinction is necessary is that a label can technically include any characters, including the NUL, @ and . characters. DNS is 8-bit capable and it s perfectly possible to have a zone file containing an entry reading "anodd.l@bel". It s not recommended of course, not least because people would have difficulty telling a dot inside a label from those separating labels, but it is legal.

However, URLs require a host name in them, and those are governed by RFCs 952 and 1123. Valid host names are a subset of domain names. Specifically only letters, digits and hyphen are allowed. Furthermore the first and last characters cannot be a hyphen. RFC 952 didn t permit a number for the first character, but RFC 1123 subsequently relaxed that.

Hence:

  • a - valid
  • 0 - valid
  • a- - invalid
  • a-b - valid
  • xn--dasdkhfsd - valid (punycode encoding of an IDN)

Off the top of my head I don t think it s possible to invalidate the a- example with a single simple regexp. The best I can come up with to check a single host label is:

if (preg_match( /^[a-zd][a-zd-]{0,62}$/i , $label) &&
   !preg_match( /-$/ , $label))
{
    # label is legal within a hostname
}

To further complicate matters, some domain name entries (typically SRV records) use labels prefixed with an underscore, e.g. _sip._udp.example.com. These are not host names, but are legal domain names.

Here is another way without regex.

$myUrl = "http://www.domain.com/link.php";
$myParsedURL = parse_url($myUrl);
$myDomainName= $myParsedURL[ host ];
$ipAddress = gethostbyname($myDomainName);
if($ipAddress == $myDomainName)
{
   echo "There is no url";
}
else
{
   echo "url found";
}

I think once you have isolated the domain name, say, using Erklan s idea:

$myUrl = "http://www.domain.com/link.php";
$myParsedURL = parse_url($myUrl);
$myDomainName= $myParsedURL[ host ];

you could use :

if( false === filter_var( $myDomainName, FILTER_VALIDATE_URL ) ) {
// failed test

}

PHP5s Filter functions are for just such a purpose I would have thought.

It does not strictly answer your question as it does not use Regex, I realise.

Regular expression is the most effective way of checking for a domain validation. If you re dead set on not using a Regular Expression (which IMO is stupid), then you could split each part of a domain:

  • www. / sub-domain
  • domain name
  • .extension

You would then have to check each character in some sort of a loop to see that it matches a valid domain.

Like I said, it s much more effective to use a regular expression.

Your regular expression is fine, but you re not using preg_match right. It returns an int (0 or 1), not a boolean. Just write if(!preg_match($regex, $string)) { ... }

If you don t want to use regular expressions, you can try this:

$str =  domain-name ;

if (ctype_alnum(str_replace( - ,   , $str)) && $str[0] !=  -  && $str[strlen($str) - 1] !=  - ) {
    echo "Valid domain
";
} else {
    echo "Invalid domain
";
}

but as said regexp are the best tool for this.

If you want to check whether a particular domain name or ip address exists or not, you can also use checkdnsrr
Here is the doc http://php.net/manual/en/function.checkdnsrr.php

A valid domain is for me something I m able to register or at least something that looks like I could register it. This is the reason why I like to separate this from "localhost"-names.

And finally I was interested in the main question if avoiding Regex would be faster and this is my result:

<?php
function filter_hostname($name, $domain_only=false) {
    // entire hostname has a maximum of 253 ASCII characters
    if (!($len = strlen($name)) || $len > 253
    // .example.org and localhost- are not allowed
    || $name[0] ==  .  || $name[0] ==  -  || $name[ $len - 1 ] ==  .  || $name[ $len - 1 ] ==  - 
    // a.de is the shortest possible domain name and needs one dot
    || ($domain_only && ($len < 4 || strpos($name,  . ) === false))
    // several combinations are not allowed
    || strpos($name,  .. ) !== false
    || strpos($name,  .- ) !== false
    || strpos($name,  -. ) !== false
    // only letters, numbers, dot and hypen are allowed
/*
    // a little bit slower
    || !ctype_alnum(str_replace(array( - ,  . ),   , $name))
*/
    || preg_match( /[^a-zd.-]/i , $name)
    ) {
        return false;
    }
    // each label may contain up to 63 characters
    $offset = 0;
    while (($pos = strpos($name,  . , $offset)) !== false) {
        if ($pos - $offset > 63) {
            return false;
        }
        $offset = $pos + 1;
    }
    return $name;
}
?>

Benchmark results compared with velcrow s function and 10000 iterations (complete results contains many code variants. It was interesting to find the fastest.):

filter_hostname($domain);// $domains: 0.43556308746338 $real_world: 0.33749794960022
is_valid_domain_name($domain);// $domains: 0.81832790374756 $real_world: 0.32248711585999

$real_world did not contain extreme long domain names to produce better results. And now I can answer your question: With the usage of ctype_alnum() it would be possible to realize it without regex, but as preg_match() was faster I would prefer that.

If you don t like the fact that "local.host" is a valid domain name use this function instead that valids against a public tld list. Maybe someone finds the time to combine both.

The correct answer is that you don t ... you let a unit tested tool do the work for you:

// return    if host invalid --
private function setHostname($host =   )
{
    $ret = (!empty($host)) ? $host :   ;
    if(filter_var( http:// .$ret. / , FILTER_VALIDATE_URL) === false) {
        $ret =   ;
    }
    return $ret;
}

further reading :https://www.w3schools.com/php/filter_validate_url.asp

If you can run shell commands, following is the best way to determine if a domain is registered.

This function returns false, if domain name isn t registered else returns domain name.

function get_domain_name($domain) { 
    //Step 1 - Return false if any shell sensitive chars or space/tab were found
    if(escapeshellcmd($domain)!=$domain || count(explode(".", $domain))<2 || preg_match("/[s	]/", $domain)) {
            return false;
    }

    //Step 2 - Get the root domain in-case of subdomain
    $domain = (count(explode(".", $domain))>2 ? strtolower(explode(".", $domain)[count(explode(".", $domain))-2].".".explode(".", $domain)[count(explode(".", $domain))-1]) : strtolower($domain));

    //Step 3 - Run shell command  dig  to get SOA servers for the domain extension
    $ns = shell_exec(escapeshellcmd("dig +short SOA ".escapeshellarg(explode(".", $domain)[count(explode(".", $domain))-1]))); 

    //Step 4 - Return false if invalid extension (returns NULL), or take the first server address out of output
    if($ns===NULL) {
            return false;
    }
    $ns = (((preg_split( /s+/ , $ns)[0])[strlen(preg_split( /s+/ , $ns)[0])-1]==".") ? substr(preg_split( /s+/ , $ns)[0], 0, strlen(preg_split( /s+/ , $ns)[0])-1) : preg_split( /s+/ , $ns)[0]);

    //Step 5 - Run another dig using the obtained address for our domain, and return false if returned NULL else return the domain name. This assumes an authoritative NS is assigned when a domain is registered, can be improved to filter more accurately.
    $ans = shell_exec(escapeshellcmd("dig +noall +authority ".escapeshellarg("@".$ns)." ".escapeshellarg($domain))); 
    return (($ans===NULL) ? false : ((strpos($ans, $ns)>-1) ? false : $domain));
}

Pros

  1. Works on any domain, while php dns functions may fail on some domains. (my .pro domain failed on php dns)
  2. Works on fresh domains without any dns (like A) records
  3. Unicode friendly

Cons

  1. Usage of shell execution, probably
<?php

if(is_valid_domain( https://www.google.com )==1){
  echo  Valid ;
}else{
   echo  InValid ;
}

 function is_valid_domain($url){

    $validation = FALSE;
    /*Parse URL*/    
    $urlparts = parse_url(filter_var($url, FILTER_SANITIZE_URL));

    /*Check host exist else path assign to host*/    
    if(!isset($urlparts[ host ])){
        $urlparts[ host ] = $urlparts[ path ];
    }

    if($urlparts[ host ]!=  ){
        /*Add scheme if not found*/        if (!isset($urlparts[ scheme ])){
        $urlparts[ scheme ] =  http ;
        }

        /*Validation*/        
    if(checkdnsrr($urlparts[ host ],  A ) && in_array($urlparts[ scheme ],array( http , https )) && ip2long($urlparts[ host ]) === FALSE){ 
        $urlparts[ host ] = preg_replace( /^www./ ,   , $urlparts[ host ]);
        $url = $urlparts[ scheme ]. :// .$urlparts[ host ]. "/";            

            if (filter_var($url, FILTER_VALIDATE_URL) !== false && @get_headers($url)) {
                $validation = TRUE;
            }
        }
    }

    return $validation;

}
?>

After reading all the issues with the added functions I decided I need something more accurate. Here s what I came up with that works for me.

If you need to specifically validate hostnames (they must start and end with an alphanumberic character and contain only alphanumerics and hyphens) this function should be enough.

function is_valid_domain($domain) {
    // Check for starting and ending hyphen(s)
    if(preg_match( /-./ , $domain) || substr($domain, 1) ==  - ) {
        return false;
    }

    // Detect and convert international UTF-8 domain names to IDNA ASCII form
    if(mb_detect_encoding($domain) != "ASCII") {
        $idn_dom = idn_to_ascii($domain);
    } else {
        $idn_dom = $domain;
    }

    // Validate
    if(filter_var($idn_dom, FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME) != false) {
        return true;
    }
    return false;
}

Note that this function will work on most (haven t tested all languages) LTR languages. It will not work on RTL languages.

is_valid_domain( a );                                                                       Y
is_valid_domain( a.b );                                                                     Y
is_valid_domain( localhost );                                                               Y
is_valid_domain( google.com );                                                              Y
is_valid_domain( news.google.co.uk );                                                       Y
is_valid_domain( xn--fsqu00a.xn--0zwm56d );                                                 Y
is_valid_domain( area51.com );                                                              Y
is_valid_domain( japanese.コム );                                                           Y
is_valid_domain( домейн.бг );                                                               Y
is_valid_domain( goo gle.com );                                                             N
is_valid_domain( google..com );                                                             N
is_valid_domain( google-.com );                                                             N
is_valid_domain( .google.com );                                                             N
is_valid_domain( <script );                                                                 N
is_valid_domain( alert( );                                                                  N
is_valid_domain( . );                                                                       N
is_valid_domain( .. );                                                                      N
is_valid_domain(   );                                                                       N
is_valid_domain( - );                                                                       N
is_valid_domain(  );                                                                        N
is_valid_domain( -günter-.de );                                                             N
is_valid_domain( -günter.de );                                                              N
is_valid_domain( günter-.de );                                                              N
is_valid_domain( sadyasgduysgduysdgyuasdgusydgsyudgsuydgusydgsyudgsuydusdsdsdsaad.com );    N
is_valid_domain( 2001:db8::7 );                                                             N
is_valid_domain( 876-555-4321 );                                                            N
is_valid_domain( 1-876-555-4321 );                                                          N

I know that this is an old question, but it was the first answer on a Google search, so it seems relevant. I recently had this same problem. The solution in my case was to just use the Public Suffix List:

https://publicsuffix.org/learn/

The suggested language specific libraries listed should all allow for easy validation of not just domain format, but also top level domain validity.





相关问题
Brute-force/DoS prevention in PHP [closed]

I am trying to write a script to prevent brute-force login attempts in a website I m building. The logic goes something like this: User sends login information. Check if username and password is ...

please can anyone check this while loop and if condition

<?php $con=mysql_connect("localhost","mts","mts"); if(!con) { die( unable to connect . mysql_error()); } mysql_select_db("mts",$con); /* date_default_timezone_set ("Asia/Calcutta"); $date = ...

定值美元

如何确认来自正确来源的数字。

Generating a drop down list of timezones with PHP

Most sites need some way to show the dates on the site in the users preferred timezone. Below are two lists that I found and then one method using the built in PHP DateTime class in PHP 5. I need ...

Text as watermarking in PHP

I want to create text as a watermark for an image. the water mark should have the following properties front: Impact color: white opacity: 31% Font style: regular, bold Bevel and Emboss size: 30 ...

How does php cast boolean variables?

How does php cast boolean variables? I was trying to save a boolean value to an array: $result["Users"]["is_login"] = true; but when I use debug the is_login value is blank. and when I do ...

热门标签