What I Learned at Work this Week: XPath

Photo by Tyler Lastovich from Pexels

This week, I was working on writing a configuration to collect data from a webpage each time a certain button was clicked. My logic was working about 75% of the time, but I couldn’t figure out what was going wrong for the other 25% (likely caused by a race condition somewhere). I consulted the Engineer who built the infrastructure that I was working with and he let me know that he was working on a new possible solution for issues like this which included the use of XPath. He spent some time explaining it but I knew that I’d have to spend some more time researching if this was going to become part of our codebase.

XPath is a syntax that points to certain DOM elements. Here’s an example, which might work if you open up devtools on Medium.com right now:

//button/img[contains(@class, 'z')]

We can search this value in the Elements tab with cmd + f or ctrl + f and if your DOM is like mine, you’ll see a match:

The X in XPath stands for XML, or Extensible Markup Language. Like HTML, XML uses hierarchical elements to build webpage displays. Here’s an example from MDN:

<?xml version="1.0" encoding="UTF-8"?>
<message>
<warning>
Hello World
</warning>
</message>

What’s unique about XML compared to HTML is that it’s dynamic. Whereas HTML can only display information with preset tags, XML can transmit data and allows us to define our own tags. If you’ve studied React, this might sound familiar because React is written using JSX (JavaScript XML). It’s the X in JSX that helps us create elements in React, and it’s this dynamic ability that makes XPath work.

At work, we were discussing an XPath option as part of a script or SDK, meaning we’d want it to be generally accessible to our clients. I wondered how much of the web was written in XML, but it turns out that XPath expressions work in HTML and SVG as well, assuming they’re supported by the browser.

XPath expressions are built using a path notation, like URLs or like a file system. Once we start our path, we’ll be looking within the scope of a node’s parents or children. The expression can specify nodes by name or property as well. Here’s a quick reference table from W3 on how nodes can be specified:

Let’s look at a few examples of XPath syntax. If you want to code along, you can head over to xpather.com. My example here uses their same DOM structure, but changes the text content and adds an attribute to a note node:

<app>
<welcome-message>
Welcome!
</welcome-message>
<abstract>
This is a blog post about XPath.
</abstract>
<description>
<subject>
We will use these elements for XPath testing.
</subject>
<subject>
XPath can help us select the right element.
</subject>
</description>
<extra-notes>
<note>
This example came from xpather.com
</note>
<note>
XPath can traverse up or down a DOM.
</note>
<note>
XPath can select based on node type, property, or position.
</note>
<note name="findme">
We made it to the end!
</note>
</extra-notes>
</app>

The first thing to note here is that this looks like HTML, but it’s using non-HTML elements like abstract, description, and subject. These are XML tags. If we wanted to use XPath to select all elements with the subject tag, we’d use two slashes to indicate that we’re selecting from anywhere in the document, then write our tag name:

//subject

My query starts to look more like a path if I want to see all subjects that are the child of a description (coincidentally, this will select the same two nodes):

//description/subject

I gave one of our note elements a name attribute. We can search attributes by affixing a predicate, which is a parameter placed inside square brackets. XPath uses @ to denote an attribute:

//note[@name='findme']

I mentioned that we can select parents as well as children. Say we want to know the parent of our findme element:

//note[@name='findme']/..

By adding two periods to our path, we’ll be going one level up, just like when we’re declaring paths for importing variables in JavaScript.

Like with a querySelector, we can pick an element from a group by adding a number as a predicate. In IE 5–9, the first element in a list is 0, like an index, but modern browsers start from 1:

//extra-notes/note[1]

This is only scratching the surface, as XPath also includes a bunch of functions. If we want to filter by the text content of our elements, we can use contains:

//extra-notes/note[contains(., 'XPath')]

This is saying that we want to search within all note nodes that are children of extra-notes. We see square brackets to indicate a predicate and then invoke our function. Contains accepts two arguments: the first argument is a specific node or group we’re searching within — in this case we want all notes, so we use a . to indicate everything on this level. The second argument is the string we’re searching for.

When my coworker was first explaining XPath to me, I wondered how it would fit into our currently existing structure. Does it replace querySelector? Does it work with JavaScript? It works with JS, but we won’t be using querySelector.

This could be a whole post in and of itself, so we’ll just look at a pair of key functions that will allow us to select a node using XPath syntax instead of querySelector. First up is document.evaluate:

var xpathResult = document.evaluate( xpathExpression, contextNode, namespaceResolver, resultType, result );

This example comes from MDN, which provides a detailed explanation for this entire process and more. Based on my understanding of that article, the five arguments are:

  • xpathExpression: We know what an XPath expression looks like by now.
  • contextNode: we can specify the node we want to look within. If we want to look at the whole document, we can write document.
  • namespaceResolver: This is a function that stringifies namespace prefixes found inside of our xpathExpression. We haven’t looked at namespace prefixes up to this point, so just know that sometimes XML elements can be called something like h:book or f:appendix and the part before the colon is a namespace prefix. If we’re working with an HTML document, we can provide a null namespace resolver. If not, we can create one with the createNSResolver function, which is part of the document prototype.
  • resultType: We usually don’t get to decide what data type we get back from a JavaScript function, but document.evaluate gives us that option. This value is an enum, meaning we can access a class with a list of possible choices. Common options include XPathResult.ANY_TYPE, XPathResult.STRING_TYPE, and XPathResult.NUMBER_TYPE.
  • result: We can either pass in an existing XPathResult object or null. If we choose the former, the value of that variable will be replaced.

If you want to see this in action, open a new tab and visit the medium.com homepage. We can use this expression in the console to select all h2 elements:

var headings = document.evaluate('//h2', document, null, XPathResult.ANY_TYPE, null );

This is very basic — we’re just selecting for h2 elements (first parameter) from the document (second parameter). We don’t have to use a namespace resolver for our HTML doc (third parameter), our result type is ANY_TYPE (fourth parameter) and we’re not providing a result object since we’re happy to create a new one (fifth argument).

If we check out our headings variable in the console, we’ll see that it’s an object with a resultType of 4 and not much else. It’s got properties for booleanValue, numericValue, stringValue, and singleNodeValue, but they all return errors because our ANY_TYPE result cannot be converted into those other types. The same goes for snapshotLength, which would be populated with the number of h2 elements found if we had chosen ORDERED_NODE_SNAPSHOT_TYPE.

The last property here is invalidIteratorState. We can use an iterator to parse through our collection, but it could become invalid if the DOM changes. It could also be true if we choose the result type UNORDERED_NODE_ITERATOR_TYPE since we can’t iterate through an unordered list.

Speaking of iterators, we can use the JS XPath function iterateNext to select the first h2 element from our collection and then the classic innerText to return its text content. Here’s a screenshot showing it all coming together (including an error once the DOM changed):

I do not advocate following any advice from Elon Musk

Using querySelector seems to be the default for a lot of JavaScript users, so I’m still learning about when XPath is preferred. It seems that certain browsers are a little faster with one instead of the other, so that might be why my coworker suggested it. I’m sure there’s more to come there so, as always, I’m just excited to feel more prepared for the journey.

Solutions Engineer