SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Zealot Coastal Web's Avatar
    Join Date
    Jan 2006
    Location
    Oregon, U.S.
    Posts
    131
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Fetching URLS from a string ?

    Greetings everyone,
    I'm trying to write up a script but l've run into a road block here. After searching google for a bit l wasn't able to turn up anything that really helped me (l also searched the forum here to see if this has been asked before...)

    What l'm trying to do is create a function that will go through a string, extract all the URLS form the string, and return them as an array.

    For instance....

    <?php

    $str = <<<end
    Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. http://www.test.com/somefile.php Ut wisi enim ad minim veniam, quis nostrud exercitation ulliam corper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem veleum iriure dolor http://www.domain.com/files/deep/lin...d=123&user=123 in hendrerit in vulputate velit esse molestie consequat, vel willum lunombro dolore eu feugiat nulla facilisis at vero http://www.google.com/ eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.
    end;

    //how would l create a function that would go through a string passed to it (similar to the string above), and fetch out all of the URLS within that string (if any) and return those URLS in an array; in this case there would be three urls return within the array...
    // http://www.test.com/somefile.php
    // http://www.domain.com/files/deep/lin...d=123&user=123
    // http://www.google.com/


    ?>

    If anyone would be willing to help me out with this is would be greatly appreciated.

    Thanks so much,

  2. #2
    SitePoint Evangelist ldivinag's Avatar
    Join Date
    Jan 2005
    Location
    N37 33* W122 3*
    Posts
    414
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Post

    a quick try:

    first, using STRPOS, which finds the first occurrance of someting in the string. it returns in integer.

    so assuming no SPACES occur in the URL, using the integer from STRPOS, you then use it as the starting spot so search for end of the URL string.

    again, if the string doesnt follow rules like including the URI in the URLs, then you are kinda messed up.

    so once you have STRPOS'ed the first one, you keep track of the spot in the string, you just continue on...

    so assuming we only have HTTP and nothing else...
    PHP Code:
    $url = array();
    $len_str strlen($str);
    $current_pos 0;
    $url_end 0;
    while ((
    $url_start strpos ($str"HTTP://"$current_pos)) or ($current_pos $len_str))
    {
      
    $url_end strpos ($str" "$url_start);
      
    //  now we assume have the start of the URL segment and the end...
      
    $url[] = substr ($str$url_start$url_end);
      
    $current_pos $url_end 1;

    issues:

    1. what if a URL is at the end of the string? it wont have a SPACE, so handle that situation.
    leo d.

  3. #3
    . shoooo... silver trophy logic_earth's Avatar
    Join Date
    Oct 2005
    Location
    CA
    Posts
    9,013
    Mentioned
    8 Post(s)
    Tagged
    0 Thread(s)
    Real(Suggestive) URLs should not have a space. There is a problem with your code Idivinag, $current_pos > $len_str causes an infinite loop and strpos is case-sensitive.

    PHP Code:
    <?php

    header
    ('content-type: text/plain');

    $str  'Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. http://www.test.com/somefile.php Ut wisi enim ad minim veniam, quis nostrud exercitation ulliam corper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Duis autem veleum iriure dolor http://www.domain.com/files/deep/lin...d=123&user=123 in hendrerit in vulputate velit esse molestie consequat, vel willum lunombro dolore eu feugiat nulla facilisis at vero http://www.google.com/ eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi.';

    # Add an extra space to the end just incase.
    $str .= ' ';

    $urls = array();
    $len  strlen($str);
    $cpos $spos $epos 0;

    # PHP5 stripos - Find position of first occurrence of a case-insensitive string
    while ($spos stripos($str'http://'$cpos)) {

        
    $epos strpos($str' '$spos);
        
    $urls[] = array_shift(explode(' 'substr($str$spos$epos))); # Need a better means.
        
    $cpos $epos 1;

    }

    print_r($urls);
    Last edited by logic_earth; Aug 12, 2007 at 01:22.
    Logic without the fatal effects.
    All code snippets are licensed under WTFPL.


  4. #4
    SitePoint Evangelist ldivinag's Avatar
    Join Date
    Jan 2005
    Location
    N37 33* W122 3*
    Posts
    414
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by logic_earth View Post
    Real(Suggestive) URLs should not have a space. There is a problem with your code Idivinag, $current_pos > $len_str causes an infinite loop and strpos is case-sensitive.

    oops... not in a infinite loop. since the condition will always be FALSE at the start, it will NEVER get into the loop.

    should be

    PHP Code:
    ($current_pos $len_str
    leo d.

  5. #5
    . shoooo... silver trophy logic_earth's Avatar
    Join Date
    Oct 2005
    Location
    CA
    Posts
    9,013
    Mentioned
    8 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by ldivinag View Post
    oops... not in a infinite loop. since the condition will always be FALSE at the start, it will NEVER get into the loop.
    No it is an infinite loop because you used an OR expression. Once it didn't find any 'http://' it would go to $current_pos greater then strlen = true = loop.

    You don't really need it once it doesn't find any 'http://' it will just end there.
    Logic without the fatal effects.
    All code snippets are licensed under WTFPL.



Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •