Parsing a text field into paragraphs

chuckylefrek · June 30, 2010, 10:45am

I am struggling to parse the contents of a text area into paragraphs.

There are various issues such as what defines a new paragraph. The text in this text area may be hand typed by the client or cut and pasted from a word document or simialr document.

Here is my latest attempt at the code using just a basic text area field however this does not give the correct result.



if (strtoupper(substr(PHP_OS,0,3)=='WIN')) {
  $eol="\\r\
";
} elseif (strtoupper(substr(PHP_OS,0,3)=='MAC')) {
  $eol="\\r";
} else {
  $eol="\
";
}

$article = $_POST["article"];	

$article = preg_replace("/\\r\
/", "\
", $article);

$paragraphs = explode($eol, $article);			

$howManyParagraphs = count($paragraphs);			

echo "how many paragraphs = " . $howManyParagraphs;

And here is my attempt when using the TinyMCE editor. This seems to work a bit better but not 100% of the time.



$article = $_POST["article"];	

preg_match_all("/(<h.>.*<\\/h.>)*<p>.*<\\/p>/iU", $article, $paragraphs); 

$howManyParagraphs = count($paragraphs);

echo "how many paragraphs = " . $howManyParagraphs;

Any advice appreciated

Thanks

Paul

AnthonySterling · July 1, 2010, 4:10pm

Sounds about right to me!

chuckylefrek · July 1, 2010, 3:45pm

Oops sorry Anthony - I missed the notification that there was a reply.

Thank you so much for the code snippet - I will give it a go.

I am coming round to the idea that I am turning this into something really complex and should have simply had 2 text fields. If there is none in the second text field there is no “Read more” link. Sounds a lot simpler that all this parsing malarkey and probably easier for client too. Hmm I am sure you recommended such a solution:)

AnthonySterling · July 1, 2010, 12:44pm


<?php
$string = '
<p id="bar">Foo</p>
<p>Foo</p>
<p class="bar">Foo</p>
<h4>foo</h4>
';

echo preg_match_all('~<p[^>]*>([^<]*)</p>~i', $string, $m); #3
?>

Or…


<?php
$string = '
<p id="bar">Foo</p>
<p>Foo</p>
<p class="bar">Foo</p>
<h4>foo</h4>
';

$doc = new SimpleXMLElement(sprintf('<root>&#37;s</root>', $string));
echo count($doc->xpath('//p')); #3
?>

chuckylefrek · July 1, 2010, 12:06pm

The following is close but no cigar



$howManyMatches = preg_match("#<p[^>]*>(.*)</p>#isU", $article,$paragraphs);

chuckylefrek · July 1, 2010, 11:47am

Ok slight flaw in my solution in that it is not detecting paragraphs that have a class.

My code is as follows:



$article = $_POST["article"];	

// Remove paragraphs that are just empty space
$article = str_replace("<p>&nbsp;</p>", "", $article);			

$howManyParagraphs = preg_match("/<p>(.*)<\\/p>/",$article,$paragraphs);

The above works fine if the paragraphs use plain <p></p> tags but if they use say

It doesn’t work.

So I would like help adjusting my preg match syntax to deal with this alternative paragraph format.

Thanks

Paul

chuckylefrek · June 30, 2010, 12:30pm

Thanks Anthony I will check that out.

Salathe - I guess it really ought to be a carriage return.

However, I have had a go another stab at doing this with the TinyMCE editor and the following seems to work though I haven’t tested exhaustively.



$article = $_POST["content"];	

$article = str_replace("<p>&nbsp;</p>", "", $article);

$howManyMatches = preg_match_all("/<p>(.*)<\\/p>/",$article,$paragraphs);

echo "How many matches = " . $howManyMatches;

salathe · June 30, 2010, 11:51am

So, what defines a new paragraph?

AnthonySterling · June 30, 2010, 11:33am

Check out PHP_EOL Paul, this simplified pattern matches text that is not part of a HTML tag.


(?<=^|>)[^><]+?(?=<|$)

Topic		Replies	Views
Paragraphs after inserting text using a text area PHP	27	14116	October 8, 2014
Extracting the first paragraph of a content PHP	8	18071	October 8, 2014
Exploding with html tag as separator PHP	8	23263	October 8, 2014
Echo a paragraph containing a string PHP	21	12536	March 30, 2011
Create paragraph after 2nd full stop PHP	12	4590	December 13, 2016

Parsing a text field into paragraphs

Related topics