SitePoint Sponsor

User Tag List

Results 1 to 17 of 17
  1. #1
    SitePoint Addict
    Join Date
    Jul 2004
    Location
    Brooklyn, NY
    Posts
    316
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    need a little RegEx help

    I have some very basic code that takes a newline and changes it to a <br/>
    Code:
    post=post.replace(/[^>]\n/g, "<br/>\n");
    basically it takes this:
    test=
    <b>Hello</b>
    test
    and turns it into this:
    test<br/>
    <b>Hello</b>
    tes<br/>
    the problem is it deletes the last character, so test turns into tes<br/>

    how do I keep it from deleting the last character?

  2. #2
    SitePoint Wizard
    Join Date
    Mar 2001
    Posts
    3,537
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    the problem is it deletes the last character, so test turns into tes<br/>
    It also deletes the '=' sign, so the behavior is consistent. The characters 't\n' match the pattern "not a '>' followed by a \n. So if this happens:

    test=\n
    test<br/>

    then this will happen:

    test\n
    tes<br/>

    You can't have the replacement for the '=' in the first one and then not have the replacement for the 't' in the second one(unless you make your regex less general). You could do this:

    post=post.replace(/([^>])\n/g, "$1<br/>\n");

    $1 grabs the actual match to the first parenthesized grouping in the regex ($2 does the same for the second parenthesized grouping, etc.).

    You should also know that a newline is different on different operating systems:

    windows: \r\n
    mac: \r
    unix: \n
    Last edited by 7stud; Feb 13, 2007 at 16:14.

  3. #3
    SitePoint Addict
    Join Date
    Jul 2004
    Location
    Brooklyn, NY
    Posts
    316
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I was trying $1 before but it didn't work because I didn't have the parenthesis

    thanks for your help, it works as intended

  4. #4
    SitePoint Addict
    Join Date
    Jul 2004
    Location
    Brooklyn, NY
    Posts
    316
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    okay, small problem - it doesn't work for multiple newlines

    EXAMPLE:
    test+

    test
    test

    WILL BE:
    test+<br/>

    test<br/>
    test


    any suggestions? I tried a few things but none worked

  5. #5
    SitePoint Wizard
    Join Date
    Mar 2001
    Posts
    3,537
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I tried a few things but none worked
    How about any beginning regex tutorial or book?

    \n+

    Is the text always going to be coming from someone using a UNIX operating system? If not, reread the end of post #2.

  6. #6
    SitePoint Addict
    Join Date
    Jul 2004
    Location
    Brooklyn, NY
    Posts
    316
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by 7stud View Post
    How about any beginning regex tutorial or book?

    \n+
    That doesn't work.

    It takes this:
    test.

    <b>test</b>
    test
    turns it into this:
    test.<br/>
    <b>test</b>
    test<br/>
    when what it should do is this:
    test.<br/>
    <br/>
    <b>test</b>
    test<br/>
    Basically I don't want multiple newlines to be turned into one <br/>
    I want each newline to have a <br/>
    I understand why it's not working, but I can't find a solution.

    Quote Originally Posted by 7stud View Post
    Is the text always going to be coming from someone using a UNIX operating system? If not, reread the end of post #2.
    This will only be used on a Windows workstation.

  7. #7
    SitePoint Wizard
    Join Date
    Mar 2001
    Posts
    3,537
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    This will only be used on a Windows workstation.
    Ahh, I see. And what is the newline character on a Windows operating system?

  8. #8
    SitePoint Addict
    Join Date
    Jul 2004
    Location
    Brooklyn, NY
    Posts
    316
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by 7stud View Post
    Ahh, I see. And what is the newline character on a Windows operating system?
    It's still \n - so the code works, but the problem is it converts multiple newlines into one

  9. #9
    SitePoint Wizard
    Join Date
    Mar 2001
    Posts
    3,537
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    And what is the newline character on a Windows operating system?
    It's still \n
    Not it's not.

    so the code works
    All the \r characters you are leaving around might cause you unforseen problems, so why do it?

  10. #10
    SitePoint Wizard
    Join Date
    Mar 2001
    Posts
    3,537
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    For the second parameter of replace(), instead of specifying a string, e.g.

    var result = str.replace(/[a-z]/, "x");

    you can specify a function which returns a string, e.g.:

    var result = str.replace(/[a-z]/, someFunc);

    js calls the function you specify and automatically sends it certain arguments; then the function needs to return a string that js will use as the replacement string. The first argument js sends the function is the actual match to the full regex. The second argument is the actual match to the first parenthesized grouping. The third argument is the actual match to the second parenthesized grouping. And so on until all the parenthesized groupings are exhausted. Finally, the pos of the full match in the source string is returned as the 2nd to last argument, and the last argument is the source string.

    Try this:
    Code:
    var str = "test.\r\n\r\n\r\n<b>test</b>\r\ntest";
    var regex = /([^>])((\r\n)+)/;
    var newStr = str.replace(regex, myFunc)
    
    function myFunc(fullMatch, subMatch1, subMatch2)
    {
    	var str = subMatch1;
    	var breakCount = subMatch2.length/2;
    
    	for(var i=0; i<breakCount; ++i)
    	{
    		str+="<br/>\n";
    	}
    	
    	return str;
    }
    
    alert(str + "\n-----\n" + newStr);
    Since to compute the return value, the function doesn't need any additional parenthesized groupings besides the first two, nor the pos of the full match, nor the source string, I didn't define the function with parameters to catch those arguments. The function doesn't need the first argument either, but you need to define a dummy placeholder for that argument because js always sends the full match as the first argument.
    Last edited by 7stud; Feb 14, 2007 at 01:19.

  11. #11
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Just remove the [^>] part
    Code:
    str = str.replace(/\n/g, "\n<br/>");

  12. #12
    SitePoint Wizard
    Join Date
    Mar 2001
    Posts
    3,537
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Just remove the [^>] part
    Then it wouldn't do what the op claims he/she wants.

  13. #13
    SitePoint Addict
    Join Date
    Jul 2004
    Location
    Brooklyn, NY
    Posts
    316
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    7stud the whole point of this function is take text from a <textarea> and add <br/> tags wherever there's a new line

    the whole point of [^>] is because I don't want a newline after block elements (div, li, etc.)

    That said, I can't understand the difference between \n and \r and how it applies to this situation.
    I can probably look at the source code of any forum textarea and just copy the code, but I wanted to learn it this way so I can understand what's happening.

  14. #14
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Code:
    post=post.replace(/([^>])\n/g, "$1<br/>\n");
    This doesn't work with multiple newlines. When /([^>])\n/ is applied to \n\n, the first newline is captured into $1 and the replacement will be "\n<br/>\n", which is not what you want. What you need here is "negative lookbehind", unfortunately js regexp engine doesn't support this feature. The workaround with multiple replacements can look like
    Code:
     
    // convert different newline forms to a single LF
    s = s.replace(/\r\n?/g, "\n")
    // remove LF after >
    s = s.replace(/>\n/g, ">")
    // replace LF into <br>
    s = s.replace(/\n/g, "<br>\n")
    Hope this helps.

  15. #15
    SitePoint Addict
    Join Date
    Jul 2004
    Location
    Brooklyn, NY
    Posts
    316
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    thanks stereofrog!
    that was exactly what I needed

    Is it possible to do a "negative lookbehind" with PHP? What would it look like?

  16. #16
    SitePoint Addict jtrelfa's Avatar
    Join Date
    Oct 2004
    Location
    Troy, Mi
    Posts
    231
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by knopix View Post
    thanks stereofrog!
    that was exactly what I needed

    Is it possible to do a "negative lookbehind" with PHP? What would it look like?
    PHP:
    PHP Code:
    nl2br("string"); 
    Changes all newline characters into <br>

  17. #17
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by knopix View Post
    Is it possible to do a "negative lookbehind" with PHP? What would it look like?
    Yes, php supports lookbehinds, both positive (?<=) and negative (?<!).

    PHP Code:
    $text "
    one
    two

    three

    <b>no break</b>
    four
    "
    ;

    # replace line ending (in any form)
    # when not preceded by ">"

    print
        
    preg_replace(
            
    '/(?<!>)(\r\n|\r|\n)/',
            
    "<br>\n",
            
    $text); 


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •