Thanks Tim,
I’ve actually come up with a solution which works, and I’ve also included a set of regex replacements for tables and other junk. It works, but I think it’s a bit clumsy, so I think I’ll end up rewriting the whole thing. I think it’s probably better to deal with all block level elements before inserting <p> tags, then all inline styles afterwards.
Anyways, this is what I got at the moment:
//format carriage returns
$text = str_replace("\\r\
", "\
", $text);
$text = str_replace("\\r", "\
", $text);
//inline code/
$text = preg_replace('/\\/\\/(.+?)\\//', '<code>$1</code>', $text);
//paragraphs
$text = '<p>' . str_replace("\
\
", '</p><p>', $text) . '</p>';
$text = str_replace("\
", "<br>", $text);
//pre + code
preg_match_all('/<p>\\/\\*.+?\\*\\/<\\/p>/', $text, $matches);
$matches = $matches[0];
$replace = array();
foreach($matches as $match){
$find = array('<p>', '</p>', '<br>', "/*\
", "\
*/");
$change = array('', '', "\
", '<pre><code>', '</code></pre>');
$match = str_replace($find, $change, $match);
$replace[] = $match;
}
$text = str_replace($matches, $replace, $text);
//headers
$text = preg_replace("/<p>(h[0-9]+)\\. ?(.+?)(<\\/p>|<br>)/i", '<$1>$2</$1><p>', $text);
//mailto [mailto]...[/mailto]
$text = preg_replace('/\\[mailto\\](.+?)\\[\\/mailto\\]/i', '<a href="mailto:$1">$1</a>', $text);
//link ...
$text = preg_replace('/\\[url\\](.+?)\\[\\/url\\]/i', '<a href="$1">$1</a>', $text);
//~emphasised~
$text = preg_replace('/~(.+?)~/', '<em>$1</em>', $text);
//*strongly emphasised*
$text = preg_replace('/\\*(.+?)\\*/', '<strong>$1</strong>', $text);
/* tables
=======
||th1 ||th2 ||th3 ||
----
|td1 |td2 |td3 |
----
|td1 |td2 |td3 |
*/
$text = str_replace('<p>====<br>', '<table><thead><tr>', $text);
$text = str_replace('|| ', '<th>', $text);
$text = str_replace('||<br>', '</thead>', $text);
$text = str_replace('</thead>----<br>', '</thead></tbody><tr>', $text);
$text = str_replace('| ', '<td>', $text);
$text = str_replace(' |</p>', '', $text);
$text = str_replace('|<br>----<br>', '<tr>', $text);
/*
[dl]
Ridiculus
- Nullam quis risus eget urna mollis ornare vel eu leo. Donec id elit non mi porta gravida at eget metus. Integer posuere erat a ante venenatis dapibus posuere velit aliquet.
Dolor Ullamcorper
- Cras justo odio, dapibus ac facilisis in, egestas eget quam. Maecenas faucibus mollis interdum. Donec id elit non mi porta gravida at eget metus. Cras mattis consectetur purus sit amet fermentum.
[/dl]
*/
preg_match_all('/<p>\\[dl].+?\\[\\/dl]/', $text, $matches);
$matches = $matches[0];
$replace = array();
foreach($matches as $match){
$match = preg_replace('/<br>- (.+?)(<\\/p>|<br>\\[\\/dl])/i', '<dd>$1</dd>', $match);
$match = '<dl>' . preg_replace('/<p>(\\[dl]<br>)?(.+?)<dd>/', '<dt>$2</dt><dd>', $match) . '</dl>';
$replace[] = $match;
}
$text = str_replace($matches, $replace, $text);
/* ul
- Ligula
- Elit
- Ridiculus
*/
$text = preg_replace('/(<p>)?- (.+?)(<\\/p>|<br>)/', '<li>$2</li>', $text);
$text = preg_replace('/(<li>(.+)<\\/li>)/', '<ul>$1</ul>', $text);
/* ol
1. Ligula
2. Elit
3. Ridiculus
*/
$text = preg_replace('/<p>([0-9]\\. .+?)<\\/p>/', '<ol>$1</ol>', $text);
$text = preg_replace('/[0-9]\\. (.+?)<br>/', '<li>$1</li>', $text);
$text = preg_replace('/[0-9]\\. (.+?)<\\/ol>/', '<li>$1</li></ol>', $text);
Sorry if its a bit long winded. I’d just like to get your opinion on my approach.
Cheers,
Mike