I want to extract the link “/some/domain/” and the text Home into an array like $bc = array(‘link’ => $link, ‘title’ => $title).
Should I be using regular expression for this kind of extraction? Or should I be using some string manipulation methods?
Have been trying to find a regular expression pattern that fits this but, I’m not really sure of how the engine works and just couldn’t find a solution.
I was thinking of the pattern as in anything in between href=" " and anything in between <a href…> </a>.
Just to clear the doubts.
Yup, the manipulation should be done before the system starts printing all the output to screen.
Although technically, it could also be done with javascript.
I’m trying to extract all information (the link and title) in the breadcrumb that drupal generated, and rebuild them. The reason to rebuild them is simply to add a class name in the <a> tag, and to append a text to the end of the breadcrumb.
I finally found the pattern to get the link: /(?<=href="\/)[\w\d\/]/
and the pattern to get the title: /[\w\s](?=<\/a>)/
The only irritating thing is /[\w\s]*(?=<\/a>)/ generates empty values like this:
So than you can loop through the links in your own theme implementation of theme_breadcrumb() to display them as you would like using the work flow of the system, rather than hacking it.
An example using the DOM extension could look something like the following basic example. Note: I “fixed” your broken HTML snippet (the » entities didn’t have semi-colons) in my example.
$snippet = '<div class="breadcrumb">
<a href="/example/domain/">Home</a>
»
<a href="/example/domain/gallery">Photo Gallery</a>
»
<a href="/example/domain/image/tid/78">Cambodia</a>
</div>';
$doc = new DOMDocument;
$doc->loadHTML($snippet);
$wrapper = $doc->getElementsByTagName('div')->item(0);
foreach($wrapper->getElementsByTagName('a') as $anchor) {
$anchor->setAttribute('class', 'breadcrumblink');
}
// Append "» New Text" after the last anchor
$anchor->parentNode->appendChild($doc->createTextNode('» New Text'));
echo $doc->saveXML($wrapper);
Which outputs (the entities might look strange, but they’re just an alternate XML-friendly representation of the » character):
In an ideal world, we would use a DOMDocumentFragment (rather than a document) and be able to use saveHTML (rather than saveXML) but the above gets the job done.