SitePoint Sponsor

User Tag List

Page 1 of 2 12 LastLast
Results 1 to 25 of 30
  1. #1
    SitePoint Enthusiast LobsterMan's Avatar
    Join Date
    Apr 2005
    Location
    Jerusalem, Israel
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Regular Expression Help

    I'm slowly learning how to use regular expressions, and i need some help here.

    I want to change the path in a bunch of img tags to get rid of all the directories and add /images/
    so that:

    HTML Code:
    <img src="images/image/folder/image.png" alt="alt text" />
    <img src="image.png" alt="alt text" />
    <img src="/folder/image.png" alt="alt text" />
    will all end up like this:
    HTML Code:
    <img src="images/image.png" alt="alt text" />
    but if it's a full url (starting with http://) it won't be touched.

    Any help would be appreciated
    Thanks.

  2. #2
    Sell crazy someplace else markl999's Avatar
    Join Date
    Aug 2003
    Location
    Manchester, UK
    Posts
    4,007
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    There's probably a better way to do this but see if this works for you:
    PHP Code:
    function chgSrc($matches)
    {
      if(
    preg_match('~^https?:\/\/~'$matches[1])) return $matches[0];
      return 
    str_replace($matches[1], '/images/'.basename($matches[1]), $matches[0]);
    }

    $pattern  "/@import\s+[\"'`][\w:?=@&\/#._;-]+[\"'`];|";
    $pattern .= ":\s*url\s*\([\s\"'`]*[\w:?=@&\/#._;-]+";
    $pattern .= "[\s\"'`]*\)|<[^>]*\s+src\=[\s\"'`]*";
    $pattern .= "([\w:?=@&\/#._;-]+)[\s\"'`]*[^>]*>/i";
    $text preg_replace_callback ($pattern'chgSrc'$yourhtml);
    echo 
    $text
    My regex is a bit rusty so hopefully a regex god will provide a better solution

  3. #3
    SitePoint Enthusiast LobsterMan's Avatar
    Join Date
    Apr 2005
    Location
    Jerusalem, Israel
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi,
    Thanks for the reply. It's not working for some reason the php gets an error

  4. #4
    Sell crazy someplace else markl999's Avatar
    Join Date
    Aug 2003
    Location
    Manchester, UK
    Posts
    4,007
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What error?

  5. #5
    SitePoint Wizard silver trophy
    Join Date
    Mar 2006
    Posts
    6,132
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by LobsterMan
    Hi,
    Thanks for the reply. It's not working for some reason the php gets an error
    php error message are generally descriptive of the problem.

    so, what is the error?

  6. #6
    SitePoint Enthusiast LobsterMan's Avatar
    Join Date
    Apr 2005
    Location
    Jerusalem, Israel
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sorry, there's no error, it just returns nothing.
    I set $yourhtml to
    PHP Code:
    $yourhtml '<img src="fonder/images/hello.png" alt="" />'

  7. #7
    Sell crazy someplace else markl999's Avatar
    Join Date
    Aug 2003
    Location
    Manchester, UK
    Posts
    4,007
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    In your browser, 'view source', don't forget it's going to echo out HTML image tags.

  8. #8
    SitePoint Enthusiast LobsterMan's Avatar
    Join Date
    Apr 2005
    Location
    Jerusalem, Israel
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    nothing. triple checked
    I also added a dummy echo after to make sure the script is executing.

  9. #9
    Sell crazy someplace else markl999's Avatar
    Join Date
    Aug 2003
    Location
    Manchester, UK
    Posts
    4,007
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Weird. Works ok for me
    PHP Code:
    <?php

    function chgSrc($matches)
    {
      if(
    preg_match('~^https?:\/\/~'$matches[1])) return $matches[0];
      return 
    str_replace($matches[1], '/images/'.basename($matches[1]), $matches[0]);
    }

    $yourhtml '<img src="fonder/images/hello.png" alt="" />';

    $pattern  "/@import\s+[\"'`][\w:?=@&\/#._;-]+[\"'`];|";
    $pattern .= ":\s*url\s*\([\s\"'`]*[\w:?=@&\/#._;-]+";
    $pattern .= "[\s\"'`]*\)|<[^>]*\s+src\=[\s\"'`]*";
    $pattern .= "([\w:?=@&\/#._;-]+)[\s\"'`]*[^>]*>/i";
    $text preg_replace_callback ($pattern'chgSrc'$yourhtml);
    echo 
    $text;

    ?>
    echo's out: <img src="/images/hello.png" alt="" />

  10. #10
    SitePoint Enthusiast LobsterMan's Avatar
    Join Date
    Apr 2005
    Location
    Jerusalem, Israel
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    ok, got it to work, copying from safari probably messes up the encoding, copied from FF and it works fine. thanks a million!!!

  11. #11
    SitePoint Enthusiast LobsterMan's Avatar
    Join Date
    Apr 2005
    Location
    Jerusalem, Israel
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    how would I filter out addresses that start with mailto: ? just like you did with http:// and https://

  12. #12
    SitePoint Enthusiast LobsterMan's Avatar
    Join Date
    Apr 2005
    Location
    Jerusalem, Israel
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Also, if this isn't asking for too much, I'd like to understand how this works, so I'm not dependent on nice guys in forums the rest of my life...
    you see, I also want to do the same thing for css url(url)

  13. #13
    Sell crazy someplace else markl999's Avatar
    Join Date
    Aug 2003
    Location
    Manchester, UK
    Posts
    4,007
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    if(preg_match('~^([htf]+p://)|(mailto:)~'$matches[1])) return $matches[0]; 

  14. #14
    SitePoint Enthusiast LobsterMan's Avatar
    Join Date
    Apr 2005
    Location
    Jerusalem, Israel
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    This also changes the source for javascript. I want it to change for img only
    Oh I wish i knew how to do this stuff...

  15. #15
    Sell crazy someplace else markl999's Avatar
    Join Date
    Aug 2003
    Location
    Manchester, UK
    Posts
    4,007
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by LobsterMan
    Also, if this isn't asking for too much, I'd like to understand how this works, so I'm not dependent on nice guys in forums the rest of my life...
    you see, I also want to do the same thing for css url(url)
    No problem, it's only right that you should want to understand
    Below is a commented version (or as best I could comment it):
    PHP Code:
    <?php

    //This function takes an array, $matches. $matches[0] will be the full
    //complete match, <img src=".." alt="... /> and $matches[1] will be the 
    //bit we are interested in, src="...."
    //It then simply replaces the bit we are interested in with our new src text,
    // /images/theimage.jpg, for example
    //basename gives us the filename part of the path/string
    //If the bit we are interested in begins with http, https or mailto then we return
    //it unchanged
    //In regex ^ means 'begin' (unless it's inside square brackets in which case it means 'not'
    //So ^([htf]+p://)|(mailto:) means:
    //If it begins with any combination of h,t or f (which includes htt, ft), followed by p:// OR
    //begins with mailto:
    function chgSrc($matches)
    {
      if(
    preg_match('~^([htf]+p://)|(mailto:)~'$matches[1])) return $matches[0];
      return 
    str_replace($matches[1], '/images/'.basename($matches[1]), $matches[0]);
    }

    //The string containing the html
    $yourhtml '<img src="fonder/images/hello.png" alt="" /> <a href="mailto:foo@bar.com" src="/cheese/bar.png" /> <img src="/one/tweo/three.jpg" alt="yay" />';

    //The below lines form the regular expression that we'll use
    //to extract the text between the src=" .... " part of the html
    //$pattern could all go on one big line but I've split it up for easy reading
    $pattern  "/@import\s+[\"'`][\w:?=@&\/#._;-]+[\"'`];|";
    $pattern .= ":\s*url\s*\([\s\"'`]*[\w:?=@&\/#._;-]+";
    $pattern .= "[\s\"'`]*\)|<[^>]*\s+src\=[\s\"'`]*";
    $pattern .= "([\w:?=@&\/#._;-]+)[\s\"'`]*[^>]*>/i";

    //Below we pass the complete match ($matches[0]) and the 
    //bit we are interested in ($matches[1]) which is the bit enclosed in
    //brackets in $pattern ... ([\w:?=@&\/#._;-]+), to the function chgSrc
    $text preg_replace_callback ($pattern'chgSrc'$yourhtml);
    echo 
    $text;

    ?>
    The regex for $pattern is a bit out of the scope of this forum and is really something you get worse at the more you practise .. er .. get better at
    There's plenty of regular expression sites out there that will help clear it up.

  16. #16
    Sell crazy someplace else markl999's Avatar
    Join Date
    Aug 2003
    Location
    Manchester, UK
    Posts
    4,007
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by LobsterMan
    This also changes the source for javascript. I want it to change for img only
    Oh I wish i knew how to do this stuff...
    Here's where my regex starts to get a bit shaky But you could do:
    PHP Code:
    if(preg_match('~^([htf]+p://)|(mailto:)~'$matches[1])) return $matches[0];
      if(
    preg_match('~[^(png|gif|jpg|jpeg)]$~'$matches[1])) return $matches[0]; 
    That should also only allow src=".." that end with png, gif, jpg or jpeg. There's bound to be a cleaner way of doing it that combines them into one but my brain refuses to attempt it.

  17. #17
    SitePoint Enthusiast LobsterMan's Avatar
    Join Date
    Apr 2005
    Location
    Jerusalem, Israel
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    markl999
    Thanks a million, you're amazing, gonna study it now...

  18. #18
    SitePoint Member mgraphic's Avatar
    Join Date
    Oct 2006
    Location
    West Hartford, CT
    Posts
    13
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by markl999
    Here's where my regex starts to get a bit shaky But you could do:
    PHP Code:
    if(preg_match('~^([htf]+p://)|(mailto:)~'$matches[1])) return $matches[0];
    if(
    preg_match('~[^(png|gif|jpg|jpeg)]$~'$matches[1])) return $matches[0]; 
    That should also only allow src=".." that end with png, gif, jpg or jpeg. There's bound to be a cleaner way of doing it that combines them into one but my brain refuses to attempt it.
    What does the tilde (~) do in the regex? I have not seen that used before.

    <mgraphic /> - It not a bug, Its an undocumented "feature" !

  19. #19
    Sell crazy someplace else markl999's Avatar
    Join Date
    Aug 2003
    Location
    Manchester, UK
    Posts
    4,007
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It's just the delimiter to start and end the expression, usually you'll see / being used (/[0-9a-z]/ etc) but when dealing with url's I prefer to use a delimiter that more than likely won't be appearing in the pattern anywhere so it's easier to read.

  20. #20
    SitePoint Wizard cranial-bore's Avatar
    Join Date
    Jan 2002
    Location
    Australia
    Posts
    2,634
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    LobsterMan -- you should check out the recent posts by Harry Fuecks in the Sitepoint PHP blog about regular expressions. It's a 3-part guide, guaranteed to help you understand them better.

  21. #21
    SitePoint Enthusiast LobsterMan's Avatar
    Join Date
    Apr 2005
    Location
    Jerusalem, Israel
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    markl999: Thanks again for the explanation, I think i pretty much understand how it works, except how you selected part of the reg ex pattern as a variable, you see, I understand how to make matches, but like in this case, i need parts of the string to make the match (like src=") but i don't want to select the whole thing. what's the magic for extracting the bit?

    cranial-bore:
    Thanks, I'll defiantly check it out.

  22. #22
    Sell crazy someplace else markl999's Avatar
    Join Date
    Aug 2003
    Location
    Manchester, UK
    Posts
    4,007
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by LobsterMan
    markl999: Thanks again for the explanation, I think i pretty much understand how it works, except how you selected part of the reg ex pattern as a variable, you see, I understand how to make matches, but like in this case, i need parts of the string to make the match (like src=") but i don't want to select the whole thing. what's the magic for extracting the bit?

    cranial-bore:
    Thanks, I'll defiantly check it out.
    PHP Code:
    $pattern .= "([\w:?=@&\/#._;-]+)[\s\"'`]*[^>]*>/i"
    It's the brackets on that line that do the magic
    You could put brackets around any other parts of the pattern and capture that too. The entire match goes into $matches[0] then the stuff in brackets goes into the other 'slots', so if you have one set of brackets (as you do above) then the pattern matched inside the brackets goes into $matches[1], if you were to add some brackets earlier on in the pattern then you'd have $matches[0](the entire match), $matches[1](the match inside the first brackets) and $matches[2](the last set of brackets).
    I haven't really explained that very well so I hope it makes at least some sense

  23. #23
    SitePoint Enthusiast LobsterMan's Avatar
    Join Date
    Apr 2005
    Location
    Jerusalem, Israel
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Oops, major problem here, if image names (the basename) are not all the same, it only does the first one, i need to parse whole HTML docs, what do I do?

  24. #24
    Sell crazy someplace else markl999's Avatar
    Join Date
    Aug 2003
    Location
    Manchester, UK
    Posts
    4,007
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by LobsterMan
    Oops, major problem here, if image names (the basename) are not all the same, it only does the first one, i need to parse whole HTML docs, what do I do?
    There's nothing in the code (above) that would make it do that. Maybe post the code you're currently using?

    Actually, one thing might cause that sort of behaviour and that's newlines in the file/html you're using, you should add a 'm' onto the end of the pattern, so it looks like:
    PHP Code:
    $pattern .= "([\w:?=@&\/#._;-]+)[\s\"'`]*[^>]*>/im"

  25. #25
    SitePoint Enthusiast LobsterMan's Avatar
    Join Date
    Apr 2005
    Location
    Jerusalem, Israel
    Posts
    94
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sorry, I just realized, your code works great, the problem started when i tried making something based on it to filter url's in css in the format of url(image.png)

    PHP Code:
    $pattern  "/@import\s+[\"'`][\w:?=@&\/#._;-]+[\"'`];|";
    $pattern .= ":\s*url\s*\([\s\"'`]*[\w:?=@&\/#._;-]+";
    $pattern .= "[\s\"'`]*\)|url\(";
    $pattern .= "([\w:?=@&\/#._;-]+)[\s\"'`]*[^>]*\)/i"
    Obviously i got something wrong...

    Also, based on your explanation of how things work, shouldn't this work?
    PHP Code:
    $pattern  "<img[.]*src=\"[[^\"]*]\"[.]*>"
    Why doesn't it?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •