The XML XPath specifications allows navigation of XML documents via a DSL that describes routes through a document using a combination of axe, steps and predicates. It has a limited number of these abstractions but together they create a powerful direct, whilst remaining simple to use, querying language.
Scales provides this power via both a traditional string based approach and an embedded DSL that leverages the power of Scalas syntactical flexibility to mimic the XPath syntax.
The DSL uses the existing Scales abstractions to the full, and works via a zipper over the XmlTree itself. Each navigation step through the tree creates new zippers and new paths through the tree.
In every case possible (with the exception of the namespace:: axis) the range of behaviours closely follows the specification, like for like queries matching 100%. Instead of matching on prefixes Scales uses fully qualified expanded QNames (qualifiedName in the QName Functions) to match against, not requiring a prefix context within which to evaluate.
Internally, perhaps unsurprisingly, XPath is implemented as a combination of filter, map and flatMap. When retrieving results (e.g. converting to an Iterable) the results are sorted into Document order, this can be expensive for large result sets (see Unsorted Results for alternatives).
Given the following document:
val ns = Namespace("test:uri")
val nsa = Namespace("test:uri:attribs")
val nsp = nsa.prefixed("pre")
val builder =
ns("Elem") /@ (nsa("pre", "attr1") -> "val1",
"attr2" -> "val2",
nsp("attr3") -> "val3") /(
ns("Child"),
"Mixed Content",
ns("Child2") /( ns("Subchild") ~> "text" )
)
we can easily query for the Subchild:
// top produces a Path from a Tree, in this case an XPath
val path = top(builder)
val res = path \* ns("Child2") \* ns("Subchild")
res.size // 1
string(res) // text
qname(res) // Subchild
Scales supports the complete useful XPath axe:
A commonly used abbreviation not listed above is of course \\, which means descendant_or_self_::. The difference being that \\ also supports possible eager evaluation and as per the spec the notion of \\ in the beginning expression.
NB Scales Embedded XPath DSL does not support the namespace axis - if you have a requirement for it then it can be looked at (please send an email to the mailing list to discuss possible improvements)
Scales embedded XPath DSL views the majority of node tests as predicates
Scales XML also adds:
There are three areas allowing for predicates within XPaths:
The first two are special cased, as in the XPath spec, as they are the most heavily used predicates (using the above example document):
// QName based match
val attributeNamePredicates = path \@ nsp("attr3")
string(attributeNamePredicates) // "val3"
// predicate based match
val attributePredicates = path \@ ( string(_) == "val3" )
qualifiedName(attributePredicates) // {test:uri:attribs}attr3
// Find child descendants that contain a Subchild
val elemsWithASubchild = path \\* ( _ \* ns("Subchild"))
string(elemsWithASubchild) // text
qualifiedName(elemsWithASubchild) // {test:uri}Child2
In each case the XmlPath (or AttributePath) is passed to the predicate with a number of short cuts for the common QName based matches and positional matches for elements:
val second = path \*(2) // path \* 2 is also valid but doesn't read like \*[2] qname(second) // Child2
The developer can chose to ignore namespaces (not recommended) by using the *:* and *:@ predicates instead (equivalent to string xpath /*= "x").
These, more difficult to model, positional tests can be leveraged the same way as position() and last() can be in XPath.
So, for example:
// /*[position() = last()] val theLast = path.\.pos_eq_last qname(theLast) // Elem // //*[position() = last()] val allLasts = path.\\*.pos_eq_last allLasts map(qname(_)) // List(Elem, Child2, Subchild) // all elems with more than one child // //*[ ./*[last() > 1]] val moreThanOne = path.\\*( _.\*.last_>(1) ) qname(moreThanOne) // Elem // all elems that aren't the first child // //*[ position() > 1] val notFirst = path.\\*.pos_>(1) qname(notFirst) // Child2
The xflatMap, xmap, xfilter and filter methods allow extra predicate usage where the existing XPath 1.0 functions don't suffice.
The filter method accepts a simple XmlPath => Boolean, whereas the other varieties work on the matching sets themselves.
It is not recommended to use these functions for general use as they primarily exist for internal re-use.
In order to meet XPath expected usage results are sorted in Document order and checked for duplicates. If this is not necessary - but speed of matching over a result set is (for example lazy querying over a large set) - then the raw functions (either raw or rawLazy) are good choices.
The viewed function however uses views as its default type and may help add further lazy evaluation. Whilst tests have shown lazy evaluation takes place its worth profiling your application to see if it actually impacts performance in an expected fashion.
See the XmlPaths trait for more information.