I am struggling to parse the contents of a text area into paragraphs.
There are various issues such as what defines a new paragraph. The text in this text area may be hand typed by the client or cut and pasted from a word document or simialr document.
Here is my latest attempt at the code using just a basic text area field however this does not give the correct result.
if (strtoupper(substr(PHP_OS,0,3)=='WIN')) {
$eol="\\r\
";
} elseif (strtoupper(substr(PHP_OS,0,3)=='MAC')) {
$eol="\\r";
} else {
$eol="\
";
}
$article = $_POST["article"];
$article = preg_replace("/\\r\
/", "\
", $article);
$paragraphs = explode($eol, $article);
$howManyParagraphs = count($paragraphs);
echo "how many paragraphs = " . $howManyParagraphs;
And here is my attempt when using the TinyMCE editor. This seems to work a bit better but not 100% of the time.
$article = $_POST["article"];
preg_match_all("/(<h.>.*<\\/h.>)*<p>.*<\\/p>/iU", $article, $paragraphs);
$howManyParagraphs = count($paragraphs);
echo "how many paragraphs = " . $howManyParagraphs;
Any advice appreciated
Thanks
Paul
Sounds about right to me! 
Oops sorry Anthony - I missed the notification that there was a reply.
Thank you so much for the code snippet - I will give it a go.
I am coming round to the idea that I am turning this into something really complex and should have simply had 2 text fields. If there is none in the second text field there is no “Read more” link. Sounds a lot simpler that all this parsing malarkey and probably easier for client too. Hmm I am sure you recommended such a solution:)
<?php
$string = '
<p id="bar">Foo</p>
<p>Foo</p>
<p class="bar">Foo</p>
<h4>foo</h4>
';
echo preg_match_all('~<p[^>]*>([^<]*)</p>~i', $string, $m); #3
?>
Or…
<?php
$string = '
<p id="bar">Foo</p>
<p>Foo</p>
<p class="bar">Foo</p>
<h4>foo</h4>
';
$doc = new SimpleXMLElement(sprintf('<root>%s</root>', $string));
echo count($doc->xpath('//p')); #3
?>

The following is close but no cigar
$howManyMatches = preg_match("#<p[^>]*>(.*)</p>#isU", $article,$paragraphs);
Ok slight flaw in my solution in that it is not detecting paragraphs that have a class.
My code is as follows:
$article = $_POST["article"];
// Remove paragraphs that are just empty space
$article = str_replace("<p> </p>", "", $article);
$howManyParagraphs = preg_match("/<p>(.*)<\\/p>/",$article,$paragraphs);
The above works fine if the paragraphs use plain <p></p> tags but if they use say
<p class=“blah”></p>
It doesn’t work.
So I would like help adjusting my preg match syntax to deal with this alternative paragraph format.
Thanks
Paul
Thanks Anthony I will check that out.
Salathe - I guess it really ought to be a carriage return.
However, I have had a go another stab at doing this with the TinyMCE editor and the following seems to work though I haven’t tested exhaustively.
$article = $_POST["content"];
$article = str_replace("<p> </p>", "", $article);
$howManyMatches = preg_match_all("/<p>(.*)<\\/p>/",$article,$paragraphs);
echo "How many matches = " . $howManyMatches;
So, what defines a new paragraph?
Check out PHP_EOL Paul, this simplified pattern matches text that is not part of a HTML tag.
(?<=^|>)[^><]+?(?=<|$)