XPath Embedded DSL

The XML XPath specifications allows navigation of XML documents via a DSL that describes routes through a document using a combination of axe, steps and predicates. It has a limited number of these abstractions but together they create a powerful direct, whilst remaining simple to use, querying language.

Scales provides this power via both a traditional string based approach and an embedded DSL that leverages the power of Scalas syntactical flexibility to mimic the XPath syntax.

The DSL uses the existing Scales abstractions to the full, and works via a zipper over the XmlTree itself. Each navigation step through the tree creates new zippers and new paths through the tree.

In every case possible (with the exception of the namespace:: axis) the range of behaviours closely follows the specification, like for like queries matching 100%. Instead of matching on prefixes Scales uses fully qualified expanded QNames (qualifiedName in the QName Functions) to match against, not requiring a prefix context within which to evaluate.

Internally, perhaps unsurprisingly, XPath is implemented as a combination of filter, map and flatMap. When retrieving results (e.g. converting to an Iterable) the results are sorted into Document order, this can be expensive for large result sets (see Unsorted Results for alternatives).

Simple Usage Examples

Given the following document:

  val ns = Namespace("test:uri")
  val nsa = Namespace("test:uri:attribs")
  val nsp = nsa.prefixed("pre")

  val builder = 
    ns("Elem") /@ (nsa("pre", "attr1") -> "val1",
      	    	   "attr2" -> "val2",
		   nsp("attr3") -> "val3") /(
      ns("Child"),
      "Mixed Content",
      ns("Child2") /( ns("Subchild") ~> "text" )
    )

we can easily query for the Subchild:

  // top produces a Path from a Tree, in this case an XPath
  val path = top(builder)

  val res = path \* ns("Child2") \* ns("Subchild")
  res.size // 1

  string(res) // text
  qname(res) // Subchild

XPath Axe

Scales supports the complete useful XPath axe:

A commonly used abbreviation not listed above is of course \\, which means descendant_or_self_::. The difference being that \\ also supports possible eager evaluation and as per the spec the notion of \\ in the beginning expression.

NB Scales Embedded XPath DSL does not support the namespace axis - if you have a requirement for it then it can be looked at (please send an email to the mailing list to discuss possible improvements)

Node Tests

Scales embedded XPath DSL views the majority of node tests as predicates

Scales XML also adds:

Predicates

There are three areas allowing for predicates within XPaths:

The first two are special cased, as in the XPath spec, as they are the most heavily used predicates (using the above example document):

  // QName based match
  val attributeNamePredicates = path \@ nsp("attr3")
  string(attributeNamePredicates) // "val3"
  
  // predicate based match
  val attributePredicates = path \@ ( string(_) == "val3" )
  qualifiedName(attributePredicates) // {test:uri:attribs}attr3

  // Find child descendants that contain a Subchild 
  val elemsWithASubchild = path \\* ( _ \* ns("Subchild"))
  string(elemsWithASubchild) // text
  qualifiedName(elemsWithASubchild) // {test:uri}Child2

In each case the XmlPath (or AttributePath) is passed to the predicate with a number of short cuts for the common QName based matches and positional matches for elements:

  val second = path \*(2) // path \* 2 is also valid but doesn't read like \*[2]
  qname(second) // Child2

The developer can chose to ignore namespaces (not recommended) by using the *:* and *:@ predicates instead (equivalent to string xpath /*= "x").

Positional Predicates

These, more difficult to model, positional tests can be leveraged the same way as position() and last() can be in XPath.

So, for example:

  // /*[position() = last()]
  val theLast = path.\.pos_eq_last
  qname(theLast) // Elem

  // //*[position() = last()]
  val allLasts = path.\\*.pos_eq_last
  allLasts map(qname(_)) // List(Elem, Child2, Subchild)

  // all elems with more than one child
  // //*[ ./*[last() > 1]]
  val moreThanOne = path.\\*( _.\*.last_>(1) )
  qname(moreThanOne) // Elem

  // all elems that aren't the first child
  // //*[ position() > 1]
  val notFirst = path.\\*.pos_>(1)
  qname(notFirst) // Child2

Direct Filtering

The xflatMap, xmap, xfilter and filter methods allow extra predicate usage where the existing XPath 1.0 functions don't suffice.

The filter method accepts a simple XmlPath => Boolean, whereas the other varieties work on the matching sets themselves.

It is not recommended to use these functions for general use as they primarily exist for internal re-use.

Unsorted Results and Views

In order to meet XPath expected usage results are sorted in Document order and checked for duplicates. If this is not necessary - but speed of matching over a result set is (for example lazy querying over a large set) - then the raw functions (either raw or rawLazy) are good choices.

The viewed function however uses views as its default type and may help add further lazy evaluation. Whilst tests have shown lazy evaluation takes place its worth profiling your application to see if it actually impacts performance in an expected fashion.

See the XmlPaths trait for more information.

Scales Xml 0.3

Generated Documentation

Documentation Highlights

First Steps
Xml Model
Accessing and Querying Data
Parsing XML
Serializing & Transforming XML
Xml Equality
Technical Details