SitePoint Sponsor

User Tag List

Results 1 to 4 of 4
  1. #1
    SitePoint Addict
    Join Date
    Oct 2010
    Posts
    207
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Link Checker tool

    how can I check the report invalid links automatic in file

    like I have some data entry company where people submit their data entry report in which they mention their link where they post data

    how can I check those links that did those links are valid which they submit in report
    did they really submit data
    or not
    how can i automatically check the links

    Thanks

  2. #2
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Hi realcoder,

    You should write a validation function that parses input when it is submitted by the user. Inside the validation function you would do something like:

    PHP Code:
    $hostname ='somebadmalforednonexistantdomain.com'
    if( 
    validateDomainName($hostname) == 1){
      
    /*write  $hostname to db */
    } else {
      
    /*return error code and display or don't write to db */
    }

    function 
    validateDomainName($hostname '');
       
    $ip gethostbyname($hostname );
        if (
    preg_match('/^(([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]).){3}([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$/',$ip)) {
           return 
    1;
        } else {
           return 
    0;
        }

    Quickly thrown together but something along that lines.

    Regards,
    Steve
    ictus==""

  3. #3
    SitePoint Addict
    Join Date
    Oct 2010
    Posts
    207
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    well user provide lists of links where they have submit data
    i want that i just put those links in any file and automatically that which links response is good and which links response is 404 ... ?

  4. #4
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Hi realcoder,

    You asked
    how can I check the report invalid links automatic in file
    The steps might be:
    1. read the file
      Code:
      $filename = "list.txt";$list = getListFile($filename);
      function getListFile($filename){
        $fh = fopen($myFile, 'r');
        $theData = fread($fh, 5);
        fclose($fh);
        return $theData;
      }
    2. For the sake of this example $list = 'This is a file that has a couple of domains, first one is "http://www.liviam.ca" the next is "https://sitepoint.com", "http://example.com?id=4["';
    3. So you need to parse the $list for clean domains that their revese DNS checks out so you do
      Code:
      $unclean_domains = doReg($list);
      
      $clean= array();
      foreach($unclean_domains as $domain){
        $domain = parse_url($domain);
        if( validateDomainName($domain['host']) == 1){
          $clean[] = $domain['host'];
        }
      }
      var_dump($clean);
      
      /* Functions */
      function doReg($string){
        $regex = '~(?:https?|irc|ftp|file)://(?:www)?.*?\.(?:com|net|info|org|ca)~i';
        preg_match_all($regex, $string, $result, PREG_PATTERN_ORDER);
        return $result[0];
      }
      function validateDomainName($hostname = ''){
         $ip = gethostbyname($hostname );
          if (preg_match('/^(([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5]).){3}([1-9]?[0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$/',$ip)) {
             return 1;
          } else {
             return 0;
          }
      }
    4. Var_Dumping $clean outputs
      array(3) { [0]=> string(13) "www.liviam.ca" [1]=> string(13) "sitepoint.com" [2]=> string(11) "example.com" }
      You can then loop through this array to grab each of the valid links.


    You don't want to rely on returned 404 codes as many different urls can be created that don't return 404. What is demonstrated above is taking a string, extracting urls from string, testing that the urls are registered to a valid reverse DNS and cleaned of any parameters, then returns a clean array of domains that you can put into a database, or a file or an object... whatever you want.

    Regards,
    Steve
    ictus==""


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •