LuckyB
July 15, 2010, 10:16am
1
Hi
I’m trying to write a preg_replace_callback that will call a function and parse to it the content of a span tag.
eg find all <span class=“glossary”> … </span> and parse the content (the …) to: function parse_content($content).
But can’t get it to work.
So far I have tried:
return preg_replace_callback('<span class="glossary">(*.)</span>','parse_content',$content);
But get an error.
Any help would be appreciated.
Thanks
The following should do the trick:
preg_replace_callback(
'|<(?:[\\w]+).*class=".*glossary.*".*>(.*)</(?:[\\w]+)>|isU',
'parse_content_array',
$content
);
You could also use DOM to achieve this (and could be a faster method), but that’s a different story.
LuckyB
July 15, 2010, 11:11pm
3
LuckyB:
Ok, one BIG thing I noticed, this creates some funny results if there is more than 1 match.
Think it has something to do with:
function parse_content_array($content)
{
return parse_content($content[1]);
}
The fixed 1.
Ok fixed this.
Any ideas on how to make it work for the following type scenarios:
<anything class=“glossary”>…</anything>
<anything … class=“glossary”>…</anything>
<anything … class=“glossary” …>…</anything>
Thanks
LuckyB
July 15, 2010, 2:31pm
4
Ok, one BIG thing I noticed, this creates some funny results if there is more than 1 match.
Think it has something to do with:
function parse_content_array($content)
{
return parse_content($content[1]);
}
The fixed 1.
The expression you’re using to match text between the span tags is back-to-front. It should be:
'<span class="glossary">(.*)</span>'
Also, when the matched text is passed to the parse_content function, it will be in the form of an array. So the actual matched text would be $content[1]. You’d need to create a separate function to handle this:
function parse_content_array($content)
{
return parse_content($content[1]);
}
return preg_replace_callback('|<span class="glossary">(.*)</span>|', 'parse_content_array', $content);
LuckyB
July 15, 2010, 12:58pm
6
Jaanboy:
The expression you’re using to match text between the span tags is back-to-front. It should be:
'<span class="glossary">(.*)</span>'
Also, when the matched text is passed to the parse_content function, it will be in the form of an array. So the actual matched text would be $content[1]. You’d need to create a separate function to handle this:
function parse_content_array($content)
{
return parse_content($content[1]);
}
return preg_replace_callback('|<span class="glossary">(.*)</span>|', 'parse_content_array', $content);
Thanks mate that works well!!
Could you suggest a way of changing from just matching <span class=“glossary”>…</span> to matching as well:
<div class=“glossary”>…</div>
<ul class=“glossary”>…</ul>
…
<anything class=“glossary”>…</anything>
Also allowing the class definition to be anywhere within the tag.
eg: <anything id=“something” class=“glossary”>
etc
I think, also being able to match such a case:
eg: <anything style=“something” class=“firstclass glossary thirdclass”> ’
would be nice too, but probably much harder.
where class being glossary being the element we want to match.
But the main thing being matching any html tag with class=“glossary”.
LuckyB
July 15, 2010, 11:37am
7
The raw content is well a big html block of text from a wordpress article.
The function parse_content is code from a wordpress plugin (that I am trying to modify). Please note, I didn’t write/modify the following code:
function parse_content($content){
//Run the glossary parser
$glossaryPageID = get_option('red_glossaryID');
if (((!is_page() && get_option('red_glossaryOnlySingle') == 0) OR
(!is_page() && get_option('red_glossaryOnlySingle') == 1 && is_single()) OR
(is_page() && get_option('red_glossaryOnPages') == 1))){
$glossary_index = get_children(array(
'post_type' => 'glossary',
'post_status' => 'publish',
));
if ($glossary_index){
$timestamp = time();
foreach($glossary_index as $glossary_item){
$timestamp++;
$glossary_title = $glossary_item->post_title;
$glossary_search = '/\\b'.$glossary_title.'[A-Za-z]*?\\b(?=([^"]*"[^"]*")*[^"]*$)/i';
$glossary_replace = '<a'.$timestamp.'>$0</a'.$timestamp.'>';
$content_temp = preg_replace($glossary_search, $glossary_replace, $content);
$content_temp = rtrim($content_temp);
$link_search = '/<a'.$timestamp.'>('.$glossary_item->post_title.'[A-Za-z]*?)<\\/a'.$timestamp.'>/i';
if (get_option('red_glossaryTooltip') == 1) {
$link_replace = '<a class="glossaryLink" href="' . get_permalink($glossary_item) . '" title="Glossary: '. $glossary_title . '" onmouseover="tooltip.show(\\'' . addslashes($glossary_item->post_content) . '\\');" onmouseout="tooltip.hide();">$1</a>';
}
else {
$link_replace = '<a class="glossaryLink" href="' . get_permalink($glossary_item) . '" title="Glossary: '. $glossary_title . '">$1</a>';
}
$content_temp = preg_replace($link_search, $link_replace, $content_temp);
$content = $content_temp;
}
}
}
return $content;
}
Can you post the raw content as well as the function parse_content() here?
LuckyB
July 15, 2010, 11:07am
9
Sorry it was a warning:
Warning: preg_replace_callback() [function.preg-replace-callback]: Unknown modifier ‘(’ in
LuckyB
July 15, 2010, 11:11am
10
With:
return preg_replace_callback('|<span class="glossary">(*.)</span>|','parse_content',$content);
…I get:
Warning: preg_replace_callback() [function.preg-replace-callback]: Compilation failed: nothing to repeat at offset 24 in …
Is the error obviously for missing patterns encloser?
Try:
return preg_replace_callback('|<span class="glossary">(*.)</span>|','parse_content',$content);