SitePoint Sponsor

User Tag List

Results 1 to 22 of 22

Thread: Simple regex

  1. #1
    SitePoint Member
    Join Date
    Apr 2005
    Posts
    24
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question Simple regex

    Hi,

    I'm not a PHP programmer,
    but I'de like to know how to turn:

    Code:
    <img src="http://localhost/txp/images/23.jpg" width="2272" height="1704" alt="" />
    into

    Code:
    http://localhost/txp/images/23.jpg
    using PHP (does this involve regular expressions?)

    Can somebody please help me?

    kind regards,

    Johan

  2. #2
    ✯✯✯ silver trophybronze trophy php_daemon's Avatar
    Join Date
    Mar 2006
    Posts
    5,284
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    preg_match('#<img src="(.+?)".+?/>#i',$data,$matches);

    echo 
    $matches[1]; 
    Saul

  3. #3
    SitePoint Member
    Join Date
    Apr 2005
    Posts
    24
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thank you for your reply...
    I looked into http://be2.php.net/preg_match,
    but I'm not at all a PHP expert.

    Would you be so kind to write a simple script where I can input

    Code:
    <img src="http://localhost/txp/images/23.jpg" width="2272" height="1704" alt="" />
    (as a variable)

    and that produces

    Code:
    http://localhost/txp/images/23.jpg
    (as a variable)

  4. #4
    SitePoint Member
    Join Date
    Apr 2005
    Posts
    24
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm looking into example 1509 at the moment,
    I hope I'm able to adapt it to my needs

  5. #5
    ✯✯✯ silver trophybronze trophy php_daemon's Avatar
    Join Date
    Mar 2006
    Posts
    5,284
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    My snippet does just that, if the $data is your html with an image.
    Saul

  6. #6
    SitePoint Member
    Join Date
    Apr 2005
    Posts
    24
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok, I get it know:
    Code:
    <?php
    $data= '<img src="http://localhost/txp/images/23.jpg" width="2272" height="1704" alt="" />';
    
    preg_match('#<img src="(.+?)".+?/>#i',$data,$matches);
    
    echo $matches[1];
    ?>
    (I had some problems with using double quotes, resulting in unexpected t_string...)

    Thank you for your fast solution!!

    Johan.

  7. #7
    SitePoint Member
    Join Date
    Apr 2005
    Posts
    24
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Maybe one more thing:

    would you please explain what

    Code:
    #<img src="(.+?)".+?/>#i
    does in plain english
    (I understand the rest of the script: the variables, the array that gets outputted,...).

    I don't know a thing about regex,
    but I'd like to understand what this regex is doing.

    Johan.

  8. #8
    ✯✯✯ silver trophybronze trophy php_daemon's Avatar
    Join Date
    Mar 2006
    Posts
    5,284
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    "(.+?)" matches everything between the quotes, same with the other .+? -- it matches everything till the closing />. The parenthesis mark the substring to separate, which you access in the array of matches (in the order of parenthesis).

    the i modifier sets the case insensitive matching

    You might wanna read up at http://www.php.net/manual/en/referen...ern.syntax.php
    Saul

  9. #9
    SitePoint Member
    Join Date
    Apr 2005
    Posts
    24
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks! You really helped me out.

  10. #10
    ✯✯✯ silver trophybronze trophy php_daemon's Avatar
    Join Date
    Mar 2006
    Posts
    5,284
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    You are very welcome.
    Saul

  11. #11
    SitePoint Wizard samsm's Avatar
    Join Date
    Nov 2001
    Location
    Atlanta, GA, USA
    Posts
    5,011
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The regex gets a lot trickier if you want to take into consideration that the src doesn't have to come first.
    Using your unpaid time to add free content to SitePoint Pty Ltd's portfolio?

  12. #12
    SitePoint Member
    Join Date
    Apr 2005
    Posts
    24
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hmm, I didn't know that.
    Isn't it an unwritten rule to put it first?

    Anyway, in the context I'm using it, the src comes first.
    I'm using it to implement Lightbox in the Textpattern CMS without installing a plugin...

  13. #13
    ✯✯✯ silver trophybronze trophy php_daemon's Avatar
    Join Date
    Mar 2006
    Posts
    5,284
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Nothing tricky about that really:
    Code:
    #<img.+?src="(.+?)".+?/>#i
    Saul

  14. #14
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by php_daemon View Post
    Nothing tricky about that really:
    Code:
    #<img.+?src="(.+?)".+?/>#i
    How about <img lowsrc="blah">?

  15. #15
    ✯✯✯ silver trophybronze trophy php_daemon's Avatar
    Join Date
    Mar 2006
    Posts
    5,284
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Code:
    #<img.*src=(?:"|\')?([^\s\'"]+)(?:"|\')?.*>#i
    Saul

  16. #16
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    This won't work either. The complete regexp (although OP obviously doesn't need it) should also be able to match cases like
    Code:
    <img onclick="alert(this.src='blah'>0)" losrc='foo' src   = "bar">

  17. #17
    ✯✯✯ silver trophybronze trophy php_daemon's Avatar
    Join Date
    Mar 2006
    Posts
    5,284
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Yes, of course, you can always mess things up and break just about any regexp.
    Saul

  18. #18
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    any poorly written regexp, yes.

  19. #19
    SitePoint Wizard samsm's Avatar
    Join Date
    Nov 2001
    Location
    Atlanta, GA, USA
    Posts
    5,011
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    My point was going to be "use an xml parser".
    Using your unpaid time to add free content to SitePoint Pty Ltd's portfolio?

  20. #20
    ✯✯✯ silver trophybronze trophy php_daemon's Avatar
    Join Date
    Mar 2006
    Posts
    5,284
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by stereofrog View Post
    any poorly written regexp, yes.
    Do show us the right one for this matter.
    Saul

  21. #21
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    html grammar is pretty straightforward, it shouldn't be a problem for anyone to translate it into regexp. See e.g. "Matching an HTML tag" from Frield's book.

  22. #22
    SitePoint Addict Jasper Bekkers's Avatar
    Join Date
    May 2007
    Location
    The Netherlands
    Posts
    282
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by php_daemon View Post
    Code:
    #<img.*src=(?:"|\')?([^\s\'"]+)(?:"|\')?.*>#i
    You should be using back-references here, because <img src="http://../'> is not allowed. Then again, that will get messy since the quotes are optional.

    PHP Code:
    $DOM = new DomDocument();
    $DOM->loadHTMLFile($theUri);
    $imageTags $DOM->getElementsByTagName('img');
    foreach(
    $imageTags as $theTag)
    {
          if(
    $theTag->hasAttribute('src'))
                echo 
    "Source: "$theTag->getAttribute('src'), PHP_EOL;
          else
                echo 
    "No source tag found!"PHP_EOL;



Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •