English 中文(简体)
Regex: Does an XPATH string point to an attribute?
原标题:

Here s what I m using: ".+/@[^/]+$". Can you think of a reason why this might not work?

最佳回答

This is actually a very subtle problem and I think a great question.

My understanding is that an (abbreviated) XPATH points to an attribute if and only its last @ is not within a predicate, that is, something of the form [...], and has no steps after it (something like /...). I think this has the relatively simple regular expression @[^]/]*$, that is, there must be an @ that has no ]s nor /s after it. Also, if you want to cover unabbreviated XPATHs, you can use (@|attribute::)[^]/]*$

I ve included a test harness that may prove useful in checking this or other tests. Note also that there may be whitespace in between tokens which can complicate some regexs.

Positive (an attribute)

  • @* or @a or ../@a or a/@b
  • a[@b and @c]/@d
  • a[b[@c="d"]/e[@f and @g]]/h[@i="j"]/@k

Negative (not an attribute)

  • a[@b] or a[@b and @c]
  • a[b[@c and @d]/@e]
  • a[b[@c="d"]/e[@f and @g]]/h[@i="j"]/k[5][@l="m"]

I can t think of a legal example where there is a / but not a ] after the last example, but I think there might be one.

Hopefully these examples make it at least a little clear that there can be arbitrary nesting of [ and ] together with @s anywhere in between. Luckily, I think only the very last @ and its nesting level matters.

(For reference, the OP s regex fails on @a. My original regex failed on a[@b and @c].)

Edit: It turns out that there are more corner cases, which convinces me that there is no perfectly-correct regular expression. For example, once you have an attribute node, there are many ways of keeping it, e.g. //@a// or //@a/. in the abbreviated syntax. There are also a variety of more creative ways, such as //@f//[node()]. All in all, it seems that if you want to cover these cases, you need to be able to match [ and ], which a basic regular expression cannot do. On the other hand, you could decide this is too contrived ...

问题回答

暂无回答




相关问题
Uncommon regular expressions [closed]

Recently I discovered two amazing regular expression features: ?: and ?!. I was curious of other neat regex features. So maybe you would like to share some tricky regular expressions.

regex to trap img tag, both versions

I need to remove image tags from text, so both versions of the tag: <img src="" ... ></img> <img src="" ... />

C++, Boost regex, replace value function of matched value?

Specifically, I have an array of strings called val, and want to replace all instances of "%{n}%" in the input with val[n]. More generally, I want the replace value to be a function of the match ...

PowerShell -match operator and multiple groups

I have the following log entry that I am processing in PowerShell I m trying to extract all the activity names and durations using the -match operator but I am only getting one match group back. I m ...

Is it possible to negate a regular expression search?

I m building a lexical analysis engine in c#. For the most part it is done and works quite well. One of the features of my lexer is that it allows any user to input their own regular expressions. This ...

regex for four-digit numbers (or "default")

I need a regex for four-digit numbers separated by comma ("default" can also be a value). Examples: 6755 3452,8767,9865,8766,3454 7678,9876 1234,9867,6876,9865 default Note: "default" ...

热门标签