This is actually a very subtle problem and I think a great question.
My understanding is that an (abbreviated) XPATH points to an attribute if and only its last @
is not within a predicate, that is, something of the form [...]
, and has no steps after it (something like /...
). I think this has the relatively simple regular expression @[^]/]*$
, that is, there must be an @
that has no ]
s nor /
s after it. Also, if you want to cover unabbreviated XPATHs, you can use (@|attribute::)[^]/]*$
I ve included a test harness that may prove useful in checking this or other tests. Note also that there may be whitespace in between tokens which can complicate some regexs.
Positive (an attribute)
@*
or @a
or ../@a
or a/@b
a[@b and @c]/@d
a[b[@c="d"]/e[@f and @g]]/h[@i="j"]/@k
Negative (not an attribute)
a[@b]
or a[@b and @c]
a[b[@c and @d]/@e]
a[b[@c="d"]/e[@f and @g]]/h[@i="j"]/k[5][@l="m"]
I can t think of a legal example where there is a /
but not a ]
after the last example, but I think there might be one.
Hopefully these examples make it at least a little clear that there can be arbitrary nesting of [
and ]
together with @
s anywhere in between. Luckily, I think only the very last @
and its nesting level matters.
(For reference, the OP s regex fails on @a
. My original regex failed on a[@b and @c]
.)
Edit: It turns out that there are more corner cases, which convinces me that there is no perfectly-correct regular expression. For example, once you have an attribute node, there are many ways of keeping it, e.g. //@a//
or //@a/.
in the abbreviated syntax. There are also a variety of more creative ways, such as //@f//[node()]
. All in all, it seems that if you want to cover these cases, you need to be able to match [
and ]
, which a basic regular expression cannot do. On the other hand, you could decide this is too contrived ...