Finding tokens within a big string, keeping them, and then replacing them


#1

So let’s say I have a string as follows:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut sagittis et metus vitae euismod. Etiam mi quam, accumsan a ligula ac, venenatis viverra erat. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Quisque hendrerit lacinia vehicula. Cras lobortis nulla nec diam gravida cursus. Vestibulum condimentum convallis leo nec luctus. Donec ut risus a lorem mollis iaculis. Praesent &1 auctor, metus ac pellentesque semper, ipsum leo pellentesque eros, nec imperdiet diam risus non velit. Donec vitae urna et neque blandit pretium. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec id congue enim, nec rutrum &2 nibh. Sed nec lacinia est. Donec elementum dignissim urna et eleifend. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia &3 Curae; Quisque et gravida orci. Proin vitae lobortis sapien.

What I want to do is parse that big string to find every “token” beginning with an ampersand (&) and then store the tokens found into an array (including the &) but then replace the token within the string with values from a separate array–basically a preg_match_all() that stores the complete token but replaces the found tokens with values from another array.

What’s the best way to do something like that?

I’ve seen a couple examples using Regular Expressions w/ preg_match_all(), which I have always thought were resource intensive…but a few using explode() combined with looping, too.

I’m not sure which is best…

Thoughts?


#2

Sounds like strtr would work.


#3

Thanks, Scallio. Thing is, I first need to parse the entire string to grab every &N token so that I can compile an entire array which will tell me how many of these tokens I have that I need to work with. I’m not entirely sure how to do this given the following token evaluation conditions:

  • Each token starts with an ampersand.
  • Each token should have only numbers after the ampersand and nothing else.

I’ll definitely look into strtr() and see what I can do with that. :slight_smile:


#4

Okay, strtr can’t helpt you find those tokens, but it can help replace them.

To find them you can use looping with strpos (hard to code but fast) or regular expressions (easy to code but a bit slower). Deciding on which to pick also depends on the length on the text as the difference will get more signaficant when you throw more text at it.


#5

I would go with the explode() approach first, and then loop over the resulting array, to build a new array with the replacements in between, so you could just implode() afterwards, simple to understand, but costs RAM. If you can’t use a text format that is better readable for PHP, you may be able to convert it with some string replacements to be parsable for vsprintf()

http://php.net/manual/en/function.vsprintf.php


#6

I agree with ScallioXTX that a lot would depend on the length of the string and how large the resultant array would be before it would make any significant difference. But I would lean towards using regex to put together an array of only what I was interested in rather than getting an array of everything and getting what I wanted out of it.

As for loops, I think that’s going to happen somewhere regardless. Either in loops written into the script or behind the scenes in a native function.

But as posted, any difference is likely to be negligible unless dealing with a large amount of data so if hesitant to use regex and more comfortable with string functions either approach can work.