PHP function to turn double new lines into paragraph but with exceptions

I have some code that replaces double line breaks with <p> and single with <br>. However, I need to have some exceptions such that I don’t add <p>...</p> around <h2>, <h3>, <h4>, <ul> and also inside <ul>...</ul>

How can I achieve this?

$text = nl2br($text, false);
$text = '<p>' . preg_replace('#(<br>[\r\n\s]+){2}#', '</p><p>', $text) . '</p>';

By not putting header tags inside of paragraph tags. (In all seriousness, your code takes whatever text you give it and puts a paragraph tag around it regardless. In the case of headers, thats already redundant.)

Your code appears to be designed to wrap non-tagged text with paragraph tags. So the idea would be just to not pass tagged text to this function in the first place.

Thanks for the reply!

Well I have database content as a variable that already contains the HTML tags, so unfortunately I need to apply the paragraph processing with them there already.

So if the database content already has the HTML tags, they should already have the paragraph tags too :stuck_out_tongue:

It’s probably actually easier for you to edit your database entries to add the paragraph tags than it is to try and do all of this at runtime (repeatedly). But.

(time to spitball. Untested, and probably not the most elegant solution.)

preg_match_all the items you want to skip. Store them in an array.
preg_replace the items to be skipped with a tag, like “###TOREPLACE###”
paragraph wrap your text.
foreach entry in the array stored earlier,
preg_replace /(<p>)?###TOREPLACE###(</p>)?/ with the entry, putting a limit of 1 on the replacement.

Well it’s a custom CMS so /n are used for saving unfortunately.

I was actually thinking best to split on double line breaks (not single) and then take each array item and check whether contains certain HTML tags.

Then skip ones that do and pre/post p tags on the ones that don’t. Then recombine into one variable at end.

Say I have an array with HTML tags to ignore. What’s the best code for this approach?

In cases like these you’re far better off parsing the string bit by bit and keeping state of what’s going on and act on that, rather than trying to explode etc, which is almost always an approximation of what needs to happen.

For example, what would happen when a newline occurs within an H2? Stuff like that is really hard to fix when using a rigid explode way.

I would suggest to use a package like this to tokenize the HTML, then walk the array and process it into the structure you need. It would be a bit harder to do than simply explode etc, but the results will be much more robust and will be less sensitive to whimsical HTML (such as a newline within an header).

Also, processes like these are awesome to code using Test Driven Development because you have well defined inputs and expected outputs.

For example:

class HtmlProcessorTest extends TestCase
{
    public function testReplaceDoubleNewlineWithParagraph()
    {
        $processor = new HtmlProcessor();
        $this->assertEquals('<p>Hello</p><p>World</p>', $processor->process("Hello\n\nWorlds"));
    }

    public function testReplaceNewlineWithBr()
    {
        $processor = new HtmlProcessor();
        $this->assertEquals('Hello<br />World', $processor->process("Hello\nWorld"));
    }

    public function testDoNotWrapHeaders()
    {
        $processor = new HtmlProcessor();
        $this->assertEquals('<h1>Hello World</h1>', $processor->process("Hello\nWorld"));
    }
}

You can then execute these tests using PHPUnit.

This has the advantages that:

  1. You can see that what you wrote actually works
  2. You can see that any addition/modification to the code doesn’t break behaviour of previously written code
  3. Your coworkers can read the tests and see what the code is supposed to do

It costs some time to write the tests, but in the end you’ll find it will be well worth it.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.