XPath 101
As an addition to my latest article, this entry will show you how to harness the power of XPath by example.
XPath is a query language for XML, akin to SQL for relational databases (ok, loosely akin!) which is used to extract nodes from an XML file.
Let’s take a look at some examples:
Here’s our XML file to parse:
We can use XPath queries through the XmlDocument.SelectNodes method to return a set of nodes matching our query. Let’s set this up:
string fileName = Server.MapPath("catalog.xml");
XmlDocument doc = new XmlDocument();
doc.Load(fileName);
Our XmlDocument is now ready to query. Rather than explain the specifics of the query language, I think its better to show by example. The full details of XPath can be found here.
OK, let’s select all the CDs in our catalog:
XmlNodeList cdNodes = doc.SelectNodes("catalog/cd");
Easy eh? Notice, we just write out the “path” of where our nodes are found in the XML file, using / to signify a level of hierarchy.
Let’s get a little more complicated. The following XPath expression will select all CDs which are by the artist Oasis:
XmlNodeList cdNodes = doc.SelectNodes("//cd[artist='Oasis']");
Notice the double slash at the start of this expression. The double slash tells XPath to look at any CD element it comes across, regardless of where exactly it is within the hierarchy. In reality, the double slash saves us time by allowing us not to write out the whole hierarchy path (if you did, it would be “catalog/cd[artist=’Oasis’]”)
The second difference with this expression is that we’re asking for all nodes which have an artist subelement equal to Oasis. The square bracket is used to signify any type of query. We can combine these queries using regular “and”s and “or”s.
Lastly, I’ll show how to grab a particular node from an element. The following query will return the price of all Beatles CDs in our catalog:
XmlNodeList cdNodes = doc.SelectNodes("//cd[artist='The Beatles']/price");
So ends a quick guide to XPath :)