[Solved] Regular expression - parse string

Hi,

I’m trying to extract all content of below string starting from tag [caption] to tag [/caption] including the tags. Preg_match_all seems to me the most adequate function. Can you help with the pattern please?

$str = "Some text... [caption id=\"attachment_275305\" align=\"aligncenter\" width=\"640\"]<a href=\"http://www.site.com\"><img class=\"size-full\" src=\"http://www.site.com\" alt=\"pic\" width=\"640\" height=\"345\" /><\/a> Photo by[/caption] ...more text";

Many thanks.

"~\[caption(.*?)\](.*?)\[/caption\]~i"

Note: Trim all subpatterns for extraneous spacing.
Edit: Probably also want the case insensitive flag.

It worked just like I wanted.

Is it also possible to remove the block [caption][/caption] and remain only with the text using regular expressions.

Thanks for your help.

stick that pattern into preg_replace, passing it “$2” as the replacement.

Just noticed it’s returning only a single block [caption][/caption]. It can happen that the string has 2 or more. Sorry, I should had mentioned this on my first post.

As to the preg_replace, unfortunately the link was kept.

echo preg_replace("~\[caption(.*?)\](.*?)\[/caption\]~i", "$2", $str);

Some text... <a href="http://www.site.com"><img class="size-full" src="http://www.site.com" alt="pic" width="640" height="345" /><\/a> Photo by ...more text

Thanks once again.

so use [FPHP]preg_match_all[/FPHP] instead of preg_match? :wink:

And if you want to remove HTML tags, [FPHP]strip_tags[/FPHP].

Sorry, a little confusion with local/production environment. I didn’t replace preg_match with preg_match_all.

All is working now. Thanks!

There is still an issue.

$str = "Some text... [caption id=\"attachment_275305\" align=\"aligncenter\" width=\"640\"]<a href=\"http://www.site.com\"><img class=\"size-full\" src=\"http://www.site.com\" alt=\"pic\" width=\"640\" height=\"345\" /><\/a> Photo by[/caption] ...more text";

echo strip_tags(preg_replace("~\[caption(.*?)\](.*?)\[/caption\]~i", "$2", $str));

The result is:
“Some text… Photo by …more text”

Instead of:
“Some text… …more text”

I wanted to remove all content within block [caption] [/caption]. Why is it that “Photo by” still appears?

Because you said you wanted to extract it. That’s what the regex is doing.

If you mean you want to replace it with an empty string, change the
"$2" to ""

Working now. Thanks.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.