RegEx BBCode issue

Hi there,

I’m writing a function to convert BBCode into HTML. It’s an adaptation of a script by Kevin Yank in his ‘Build Your Own Database Driven Website with PHP & MySQL’… Anyway, I want to replace /* */ comment marks to <pre></pre>

Here’s what I go so far:


function html($text){
  return htmlentities($text, ENT_QUOTES, 'UTF-8');
}

function htmlout($text){
  echo html($text);
}

function bbcode2html($text){
  $text = html($text);

  //format carriage returns
  $text = str_replace("\\r\
", "\
", $text);
  $text = str_replace("\\r", "\
", $text);

  //paragraphs
  $text = '<p>' . str_replace("\
\
", '</p><p>', $text) . '</p>';
  $text = str_replace("\
", "<br>", $text);
	
  //code... fine but inserts <br>
  $text = preg_replace('/<p>\\/\\*(.+?)\\*\\//', '<pre>$1</pre>', $text);
	
  //headers
  $text = preg_replace("/<p>([0-9]+)\\. ?(.+?)<\\/p>/", '<h$1>$2</h$1>', $text);
	
  return $text;
}

function bbcodeout($text){
  echo bbcode2html($text);
}

Which is fine, but I get <br> tags in my <pre> tags, which is a shame really cos its preformatted text, so it doesn’t need them.

I tried doing this, but it didn’t work:


$text = preg_replace('/<p>\\/\\*(.+?)\\*\\//', '<pre>' . str_replace('/<br>/', "\
", $1) . '</pre>', $text);

Is there a better way of approaching this?

Many thanks in advance,
Mike

You are replacing ALL "
" tags in the line above with <br>, the solution will be to modify the string replace to a regexp that then skips over any
's that are within the comment section.

You are trying to do a string replace on something that, at the time doesn’t exist.

In that case, you will need to do another preg_replace to replace any <br>'s found between <pre> tags, seperate and after the first replace.

Thanks Tim,

I’ve actually come up with a solution which works, and I’ve also included a set of regex replacements for tables and other junk. It works, but I think it’s a bit clumsy, so I think I’ll end up rewriting the whole thing. I think it’s probably better to deal with all block level elements before inserting <p> tags, then all inline styles afterwards.

Anyways, this is what I got at the moment:


//format carriage returns
  $text = str_replace("\\r\
", "\
", $text);
  $text = str_replace("\\r", "\
", $text);
	
  //inline code/
  $text = preg_replace('/\\/\\/(.+?)\\//', '<code>$1</code>', $text);

  //paragraphs
  $text = '<p>' . str_replace("\
\
", '</p><p>', $text) . '</p>';
  $text = str_replace("\
", "<br>", $text);
	
  //pre + code
  preg_match_all('/<p>\\/\\*.+?\\*\\/<\\/p>/', $text, $matches);
  $matches = $matches[0];
  $replace = array();
  foreach($matches as $match){
    $find = array('<p>', '</p>', '<br>', "/*\
", "\
*/");
    $change = array('', '', "\
", '<pre><code>', '</code></pre>');
    $match = str_replace($find, $change, $match);
    $replace[] = $match;
  }
  $text = str_replace($matches, $replace, $text);
	
  //headers
  $text = preg_replace("/<p>(h[0-9]+)\\. ?(.+?)(<\\/p>|<br>)/i", '<$1>$2</$1><p>', $text);
	
  //mailto [mailto]...[/mailto]
  $text = preg_replace('/\\[mailto\\](.+?)\\[\\/mailto\\]/i', '<a href="mailto:$1">$1</a>', $text);
	
  //link ...
  $text = preg_replace('/\\[url\\](.+?)\\[\\/url\\]/i', '<a href="$1">$1</a>', $text);
	
  //~emphasised~
  $text = preg_replace('/~(.+?)~/', '<em>$1</em>', $text);
	
  //*strongly emphasised*
  $text = preg_replace('/\\*(.+?)\\*/', '<strong>$1</strong>', $text);

  /* tables
  =======
  ||th1		||th2			||th3			||
  ----
  |td1			|td2			|td3			|
  ----
  |td1			|td2			|td3			|
  */
  $text = str_replace('<p>====<br>', '<table><thead><tr>', $text);
  $text = str_replace('|| ', '<th>', $text);
  $text = str_replace('||<br>', '</thead>', $text);
  $text = str_replace('</thead>----<br>', '</thead></tbody><tr>', $text);
  $text = str_replace('| ', '<td>', $text);
  $text = str_replace(' |</p>', '', $text);
  $text = str_replace('|<br>----<br>', '<tr>', $text);
	
  /*
  [dl]
  Ridiculus
  - Nullam quis risus eget urna mollis ornare vel eu leo. Donec id elit non mi porta gravida at eget metus. Integer posuere erat a ante venenatis dapibus posuere velit aliquet.
	
  Dolor Ullamcorper
  - Cras justo odio, dapibus ac facilisis in, egestas eget quam. Maecenas faucibus mollis interdum. Donec id elit non mi porta gravida at eget metus. Cras mattis consectetur purus sit amet fermentum.
  [/dl]
  */
  preg_match_all('/<p>\\[dl].+?\\[\\/dl]/', $text, $matches);
  $matches = $matches[0];
  $replace = array();
  foreach($matches as $match){
    $match = preg_replace('/<br>- (.+?)(<\\/p>|<br>\\[\\/dl])/i', '<dd>$1</dd>', $match);
    $match = '<dl>' . preg_replace('/<p>(\\[dl]<br>)?(.+?)<dd>/', '<dt>$2</dt><dd>', $match) . '</dl>';
    $replace[] = $match;
  }
  $text = str_replace($matches, $replace, $text);
	
  /* ul
  - Ligula
  - Elit
  - Ridiculus
  */
  $text = preg_replace('/(<p>)?- (.+?)(<\\/p>|<br>)/', '<li>$2</li>', $text);
  $text = preg_replace('/(<li>(.+)<\\/li>)/', '<ul>$1</ul>', $text);
	
  /* ol
  1. Ligula
  2. Elit
  3. Ridiculus
  */
  $text = preg_replace('/<p>([0-9]\\. .+?)<\\/p>/', '<ol>$1</ol>', $text);
  $text = preg_replace('/[0-9]\\. (.+?)<br>/', '<li>$1</li>', $text);
  $text = preg_replace('/[0-9]\\. (.+?)<\\/ol>/', '<li>$1</li></ol>', $text);

Sorry if its a bit long winded. I’d just like to get your opinion on my approach.

Cheers,
Mike