SitePoint Sponsor

User Tag List

Results 1 to 10 of 10

Thread: Parsing URL

  1. #1
    AdSpeed.com Son Nguyen's Avatar
    Join Date
    Aug 2000
    Location
    Silicon Valley
    Posts
    2,241
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Parsing URL

    Hey guys,

    I'm wondering how do you identify the domain (including subdomain) for an arbitrary URL?
    For eg:
    http://sub.domain.co.uk/dir1/dir2/index.php?t=1&v=2
    --> http://sub.domain.co.uk/
    http://domain.co.uk/index.php?t=1&v=2
    --> http://domain.co.uk/
    http://sub2.sub.domain.co.uk/index.php?t=1&v=2
    --> http://sub2.sub.domain.co.uk/

    And potentially many other variations.
    This is what Spamcop.net does when parsing the email's header.

    So any suggestion would be great.
    Thanks
    - Son Nguyen
    AdSpeed.com - Ad Serving and Ad Management Made Easy

  2. #2
    SitePoint Addict lveale's Avatar
    Join Date
    Jun 2001
    Location
    Dublin
    Posts
    221
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

  3. #3
    AdSpeed.com Son Nguyen's Avatar
    Join Date
    Aug 2000
    Location
    Silicon Valley
    Posts
    2,241
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks,

    I was also on the line of getting the root string for the domain and strip out the subdomain like
    sub.ad-rotator.com -> ad-rotator.com
    and also with country suffix
    sub.domain.co.uk -> domain.co.uk
    sub1.sub2.domain.co.vn -> domain.co.vn
    ...

    Any solution? Thanks
    - Son Nguyen
    AdSpeed.com - Ad Serving and Ad Management Made Easy

  4. #4
    SitePoint Zealot
    Join Date
    Mar 2002
    Location
    Perth, Australia
    Posts
    157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That would be kind of tricky, since the definition of a subdomain depends on where you draw the line:
    eg sub.domain.com.au (since I'm an aussie) - the root domain is actually .au, but you would probably want domain.com.au. You would need to come up with a list of all the root domains that you are interested in (.com..co.uk, .net.au, .cz, etc) and then for each hostname, you would want to discard everything up to the last servername part before the matched root domain, ie

    sub1.sub2.domain.com.au
    => root domain is .com.au
    => discard sub1.sub2.
    => keep domain.com.au

    If you're planning on doing this for the whole of the internet, you should search on the internet for a list of international domains (or try to hunt down some big logfiles).

    Good luck!
    Paul Davey
    webmaster for Whitford Church of Christ

  5. #5
    SitePoint Member burchyk's Avatar
    Join Date
    Mar 2002
    Posts
    14
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Split the domain with regular expression

  6. #6
    AdSpeed.com Son Nguyen's Avatar
    Join Date
    Aug 2000
    Location
    Silicon Valley
    Posts
    2,241
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Great answer, bobbymac!

    So do you know or anyone knows of a list (partial is fine) of country domains (like com.au, edu.sg, com.vn...)?

    Thanks
    - Son Nguyen
    AdSpeed.com - Ad Serving and Ad Management Made Easy

  7. #7
    SitePoint Member
    Join Date
    Mar 2002
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

  8. #8
    AdSpeed.com Son Nguyen's Avatar
    Join Date
    Aug 2000
    Location
    Silicon Valley
    Posts
    2,241
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You want to give a specific link? Network Solution isn't a small site.

    Thanks
    - Son Nguyen
    AdSpeed.com - Ad Serving and Ad Management Made Easy

  9. #9
    SitePoint Zealot
    Join Date
    Mar 2002
    Location
    Perth, Australia
    Posts
    157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Paul Davey
    webmaster for Whitford Church of Christ

  10. #10
    AdSpeed.com Son Nguyen's Avatar
    Join Date
    Aug 2000
    Location
    Silicon Valley
    Posts
    2,241
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Cool

    Exactly what I wanted!
    Thanks bobbymac, you always have the answer I need
    - Son Nguyen
    AdSpeed.com - Ad Serving and Ad Management Made Easy


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •