What I Learned at Work this Week: XPath
This week, I was working on writing a configuration to collect data from a webpage each time a certain button was clicked. My logic was working about 75% of the time, but I couldn’t figure out what was going wrong for the other 25% (likely caused by a race condition somewhere). I consulted the Engineer who built the infrastructure that I was working with and he let me know that he was working on a new possible solution for issues like this which included the use of XPath. He spent some time explaining it but I knew that I’d have to spend some more time researching if this was going to become part of our codebase.
The X in XPath: XML
XPath is a syntax that points to certain DOM elements. Here’s an example, which might work if you open up devtools on Medium.com right now:
We can search this value in the Elements tab with cmd + f or ctrl + f and if your DOM is like mine, you’ll see a match:
The X in XPath stands for XML, or Extensible Markup Language. Like HTML, XML uses hierarchical elements to build webpage displays. Here’s an example from MDN:
<?xml version="1.0" encoding="UTF-8"?>
At work, we were discussing an XPath option as part of a script or SDK, meaning we’d want it to be generally accessible to our clients. I wondered how much of the web was written in XML, but it turns out that XPath expressions work in HTML and SVG as well, assuming they’re supported by the browser.
XPath expressions are built using a path notation, like URLs or like a file system. Once we start our path, we’ll be looking within the scope of a node’s parents or children. The expression can specify nodes by name or property as well. Here’s a quick reference table from W3 on how nodes can be specified:
Let’s look at a few examples of XPath syntax. If you want to code along, you can head over to xpather.com. My example here uses their same DOM structure, but changes the text content and adds an attribute to a note node:
This is a blog post about XPath.
We will use these elements for XPath testing.
XPath can help us select the right element.
This example came from xpather.com
XPath can traverse up or down a DOM.
XPath can select based on node type, property, or position.
We made it to the end!
The first thing to note here is that this looks like HTML, but it’s using non-HTML elements like abstract, description, and subject. These are XML tags. If we wanted to use XPath to select all elements with the subject tag, we’d use two slashes to indicate that we’re selecting from anywhere in the document, then write our tag name:
My query starts to look more like a path if I want to see all subjects that are the child of a description (coincidentally, this will select the same two nodes):
I gave one of our note elements a name attribute. We can search attributes by affixing a predicate, which is a parameter placed inside square brackets. XPath uses @ to denote an attribute:
I mentioned that we can select parents as well as children. Say we want to know the parent of our findme element:
Like with a querySelector, we can pick an element from a group by adding a number as a predicate. In IE 5–9, the first element in a list is 0, like an index, but modern browsers start from 1:
This is only scratching the surface, as XPath also includes a bunch of functions. If we want to filter by the text content of our elements, we can use contains:
This is saying that we want to search within all note nodes that are children of extra-notes. We see square brackets to indicate a predicate and then invoke our function. Contains accepts two arguments: the first argument is a specific node or group we’re searching within — in this case we want all notes, so we use a . to indicate everything on this level. The second argument is the string we’re searching for.
This could be a whole post in and of itself, so we’ll just look at a pair of key functions that will allow us to select a node using XPath syntax instead of querySelector. First up is document.evaluate:
var xpathResult = document.evaluate( xpathExpression, contextNode, namespaceResolver, resultType, result );
This example comes from MDN, which provides a detailed explanation for this entire process and more. Based on my understanding of that article, the five arguments are:
- xpathExpression: We know what an XPath expression looks like by now.
- contextNode: we can specify the node we want to look within. If we want to look at the whole document, we can write document.
- namespaceResolver: This is a function that stringifies namespace prefixes found inside of our xpathExpression. We haven’t looked at namespace prefixes up to this point, so just know that sometimes XML elements can be called something like h:book or f:appendix and the part before the colon is a namespace prefix. If we’re working with an HTML document, we can provide a null namespace resolver. If not, we can create one with the createNSResolver function, which is part of the document prototype.
- result: We can either pass in an existing XPathResult object or null. If we choose the former, the value of that variable will be replaced.
If you want to see this in action, open a new tab and visit the medium.com homepage. We can use this expression in the console to select all h2 elements:
var headings = document.evaluate('//h2', document, null, XPathResult.ANY_TYPE, null );
This is very basic — we’re just selecting for h2 elements (first parameter) from the document (second parameter). We don’t have to use a namespace resolver for our HTML doc (third parameter), our result type is ANY_TYPE (fourth parameter) and we’re not providing a result object since we’re happy to create a new one (fifth argument).
If we check out our headings variable in the console, we’ll see that it’s an object with a resultType of 4 and not much else. It’s got properties for booleanValue, numericValue, stringValue, and singleNodeValue, but they all return errors because our ANY_TYPE result cannot be converted into those other types. The same goes for snapshotLength, which would be populated with the number of h2 elements found if we had chosen ORDERED_NODE_SNAPSHOT_TYPE.
The last property here is invalidIteratorState. We can use an iterator to parse through our collection, but it could become invalid if the DOM changes. It could also be true if we choose the result type UNORDERED_NODE_ITERATOR_TYPE since we can’t iterate through an unordered list.
Speaking of iterators, we can use the JS XPath function iterateNext to select the first h2 element from our collection and then the classic innerText to return its text content. Here’s a screenshot showing it all coming together (including an error once the DOM changed):