SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    SitePoint Enthusiast
    Join Date
    Jun 2003
    Location
    Sweden
    Posts
    47
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Regular expression pattern to split text into paragraphs

    I could really use a regular expression pattern to split text into paragraphs. Can anyone point me in the right direction?

  2. #2
    SitePoint Wizard
    Join Date
    Dec 2004
    Location
    At My Desk!!
    Posts
    1,642
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    you might need to give an example of what it is u want, u mean after so many lines create a new paragraph type thing?
    "Am I the only one doing ASP.NET in Delphi(Pascal)?"

  3. #3
    SitePoint Enthusiast
    Join Date
    Jun 2003
    Location
    Sweden
    Posts
    47
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm using preg_match_all() to find chunks of text (paragraphs) in a CMS. The users enter text in a textarea like so:
    Etiam erat mi, viverra quis, vestibulum sit amet, commodo in, est. Aenean eget dui vitae ipsum blandit posuere. Ut sollicitudin porttitor arcu. Sed est sem, iaculis sit amet, tempus eget, commodo sit amet, sapien. Aenean at mi eu pede pulvinar sagittis. Ut rutrum wisi sit amet nunc. Suspendisse potenti. Vestibulum convallis sagittis erat. Curabitur sit amet mi. Donec sed quam. Vestibulum ut lorem.

    Etiam erat mi, viverra quis, vestibulum sit amet, commodo in, est. Aenean eget dui vitae ipsum blandit posuere. Ut sollicitudin porttitor arcu. Sed est sem, iaculis sit amet, tempus eget, commodo sit amet, sapien. Aenean at mi eu pede pulvinar sagittis. Ut rutrum wisi sit amet nunc. Suspendisse potenti. Vestibulum convallis sagittis erat. Curabitur sit amet mi. Donec sed quam. Vestibulum ut lorem.
    And I want to collect all the paragraphs into an array using preg_match_all(). I have tried the following code, but there are a lot of empty matches so the pattern needs refinement.
    Code:
    preg_match_all('/[^\r\n]*/', $article, $paragraphs, PREG_PATTERN_ORDER)

  4. #4
    SitePoint Wizard siteguru's Avatar
    Join Date
    Oct 2002
    Location
    Scotland
    Posts
    3,609
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    That's probably because there are 2 of \r\n between paragraphs. (1 after lorem. and i before Etiam - i.e. the empty line). Also, are you sure that the RETURN is always going to be \r\n?
    Ian Anderson
    www.siteguru.co.uk

  5. #5
    SitePoint Enthusiast
    Join Date
    Sep 2006
    Posts
    49
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    try this:
    PHP Code:
    $test 'this is a paragraph
    this is not a paragraph

    this is a paragraph'
    ;

    $arr preg_split("/(\s?\n){2,}\s?/"$test);

    var_dump(array_filter($arrcreate_function('$a''return (trim($a) != "");'))); 

  6. #6
    SitePoint Enthusiast
    Join Date
    Jun 2003
    Location
    Sweden
    Posts
    47
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Wow, thanks Jenk! That was exactly what I was looking for.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •