SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Evangelist
    Join Date
    Jun 2008
    Posts
    455
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    I'm building a url validator using regex and i'm stumped

    Hi,

    Im building a url validator that validates the url against regular expression patterns.
    I've got all the way to the domain extension and i'm not sure how to create this part.
    What I mean is with the large number of extensions which then have extensions on them, like .ac which can also have .ac.uk, etc i'm not sure where to start.

    Any tips on what would be the best solution for doing this part?

    Thanks,
    Michael.

  2. #2
    SitePoint Addict
    Join Date
    Dec 2007
    Posts
    358
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The last part of the domain defines region it belongs to or its general purpose (like mobi, com, org, etc.) The complete list you can take here: http://www.iana.org/domains/root/db/

    The rest may contains any ASCII characters, more details are here: http://www.faqs.org/rfcs/rfc1035.html - Preferred name syntax
    I'm creating trouble-free Apache, PHP, MySQL installer, WITSuite,
    and use it to setup my development environment.
    Demo, support, contact. Questions?

  3. #3
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Unless you plan to maintain an ever growing list of TLD's and country codes, you cannot really validate it without doing a dns lookup. Check it for proper form, and be done with it.

  4. #4
    . shoooo... silver trophy logic_earth's Avatar
    Join Date
    Oct 2005
    Location
    CA
    Posts
    9,013
    Mentioned
    8 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by crmalibu View Post
    Unless you plan to maintain an ever growing list of TLD's and country codes, you cannot really validate it without doing a dns lookup. Check it for proper form, and be done with it.
    Or just download this file every so often and parse it. http://data.iana.org/TLD/tlds-alpha-by-domain.txt
    Logic without the fatal effects.
    All code snippets are licensed under WTFPL.


  5. #5
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    This maybe of use, although I suspect a DNS lookup will be faster unless you cache the TLD source.

    PHP Code:
    <?php
    /**
     * @desc    Validates a top-level against a list
     *             supplied by iana.org.
     * 
     * @author    Anthony Sterling
     *
     * @param    String    $sTLD
     * @return    Boolean
     */
    function validateTLD($sTLD)
    {
        
    $oXPath = new DOMXPath(@DOMDocument::loadHTML(file_get_contents('http://www.iana.org/domains/root/db/')));
        foreach(
    $oXPath->query('/html/body/div[2]/div/div//a') as $oNode)
        {
            if(
    $oNode->nodeValue == strtoupper($sTLD))
            {
                return 
    true;
            }
        }
        return 
    false;
    }
    ?>


    Edit: Bah, Logic beat me to it AND provided a better source.
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •