Getting Data from within the Document

I’m studying JavaScript right now and learned that you can, for example, read the data within all the <h1> tags on a page and create links to them somewhere else in the page.

Perhaps I’m having a brain burp, but can you do this with PHP also? Like, search all the <h1> tags in a page (the equivalent in JavaScript is document.getElementsByTagName(“h1”)), get their content (innerHTML), and then use those Strings somewhere else in the page (to maybe, make links)?

Hopefully I’ve been clear with my question, and this is probably possible, I think I’m just forgetting or not thinking properly on how to do it.

Help is appreciated; thank you.

Yes, you can parse text in any programming language. You can either use an XML parser and treat the webpage as an XML document, or you can use regular expressions and just look for patterns in the text (like the letters <h1> followed by some text followed by </h1>).

http://php.net/manual/en/book.simplexml.php
http://php.net/manual/en/book.xml.php
http://php.net/manual/en/function.preg-match-all.php

Oh okay yea I was hoping there’d be an easier way to interact with the DOM than to actually scan all of the text in the whole document.

Thanks for the quick answer.

Don’t use regex for a task like this. Instead, use the DOMDocument class. It has lots of method the same as JavaScript.


$doc = new DOMDocument();
$doc->fromXHTML('.......');
$doc->getElementsByTagName('h1');

Ah, now that’s what I was looking for. Thanks a lot!