English 中文(简体)
Modify regular expression for a list of numbers and numeric range expressions
原标题:
  • 时间:2009-11-16 18:11:23
  •  标签:
  • regex
  • extjs

I am using ExtJS. One of the textfield made with ExtJS component should allow comma separated number/opeator strings (3 similar examples) like

1, 2-3, 4..5, <6, <=7, >8, >=9 
>2, 3..5, >=9,>10
<=9, 1, <=8, 4..5, 8-9

Here I am using equals, range (-), sequence (..) & greater than/equal to operators for numbers less than or equal to 100. These numbers are separated by a comma.

What can be a regular expression for this type of string?

For my previously asked question.. I got a solution from "dlamblin": ^(?:d+(?:(?:..|-)d+)?|[<>]=?d+)(?:,s*d+(?:(?:..|-)d+)?|[<>]=?d+)*$

This works perfect for all patterns except:

  1. Only if relationship operators (<, <=, >, >=) are present as first element of the string. E.g. <=3, 4-5, 6, 7..8 works perfect, but <=3, 4-5, 6, 7..8, >=5 relationship operator not at 1st element of string.

  2. Also string <3<4, 5, 9-4 does not give any error i.e. it is satisfying condition though comma is needed between <3 and <4.

  3. Numbers in the string should be less than or equal to 100. i.e. <100, 0-100, 99..100

  4. It should not allow leading zeros (like 003, 099)

问题回答

Scrap that and use a tokenizer instead. Split up the string by commas, then look at each token and decide (possibly using a regular expression) which type of relationship it is. If it s none of the existing relationships, it s invalid. If any relationship contains a number that s too big, it s invalid.

For the sake of your sanity and the people who will have to maintain this code after you re done with it, don t use regular expressions to validate such a complicated interrelated set of rules. Break it down into simpler chunks.

Welbog s advice to use a tokenizer is the sane option.

If you have some other constraint that forces a regular expression, you can use

^(<|<=|>|>=)?s*(100|0|[1-9]d?)((..|-)(100|0|[1-9]d?))?(,s*(<|<=|>|>=)?s*(100|0|[1-9]d?)((..|-)(100|0|[1-9]d?))?)*$

That s the result of expanding manually the following:

num   = (100|0|[1-9]d?)
op    = (<|<=|>|>=)
range = op?s*num((..|-)num)?
expr  = ^range(,s*range)*$

I agree with Welbog that pre/post processing should be a better choice.

BUT since I like to so RegEx so here is my solution.

^[ 	]*(?:(?:0|[1-9][0-9]?|100)(?:(?:-|..)(?:0|[1-9][0-9]?|100))?|(?:[<>]=?)(?:0|[1-9][0-9]?|100))(?:[ 	]*,[ 	]*(?:(?:0|[1-9][0-9]?|100)(?:(?:-|..)(?:0|[1-9][0-9]?|100))?|(?:[<>]=?)(?:0|[1-9][0-9]?|100)))*[ 	]*$

s is not used as it may include in some engine.

d is not used as you will need [1-9] so [0-9] will be easier to use.

(?:0|[1-9][0-9]?|100) will match a number from 0 to 100 without leading zero.

(?:[&lt;&gt;]=?)(?:0|[1-9][0-9]?|100) will match conditions follows by a number (if you want to match = too, just adjust it).

(?:0|[1-9][0-9]?|100)(?:(?:-|..)(?:0|[1-9][0-9]?|100))? will match a number with optional range or sequence.

Full explanation:

^
[ 	]*  // Prefix spaces
(?: // A valid term
    // A number
    (?:0|[1-9][0-9]?|100)
    // Optional range or sequence
    (?:
        (?:-|..)
        (?:0|[1-9][0-9]?|100)
    )?
    |
    // Condition and number
    (?:[<>]=?)(?:0|[1-9][0-9]?|100)
)
(?: // Other terms
    [ 	]*,[ 	]*   // Comma with prefix and suffix spaces
    (?: // A valid term
        // A number
        (?:0|[1-9][0-9]?|100)
        // Optional range or sequence
        (?:
            (?:-|..)
            (?:0|[1-9][0-9]?|100)
        )?
        |
        // Condition and number
        (?:[<>]=?)(?:0|[1-9][0-9]?|100)
    )
)*
[ 	]*  // Tail spaces

I test with regex-search of Eclipse and it work.

Hope this helps.

This should work:

^(?:(?:s*((?:<|>|<=|>=)?(?:[1-9]|[1-9]d|100))s*(?:,|$))|(?:s*((?:[1-9]|[1-9]d|100)(?:..|-)(?:[1-9]|[1-9]d|100))s*(?:,|$)))*$

(You ll need to use the "multiline" option, obviously.)

If you have the advantage of a regex engine that supports the "ignore whitespace" option, then you could break it up like this:

^                           # beginning of line
(?:   
  (?:
    s*                     # any whitespace
    (                       # capture group
      (?:<|>|<=|>=)?        # inequality
      (?:[1-9]|[1-9]d|100) # single value
    )
    s*                     # any whitespace
    (?:,|$)                 # comma or end of line
  )
  |
  (?:
    s*                     # any whitespace
    (                       # catpure group
      (?:[1-9]|[1-9]d|100) # single value
      (?:..|-)           # range modifier
      (?:[1-9]|[1-9]d|100) # single value
    )
    s*                     # any whitespace
    (?:,|$)                 # comma or end of line
  )
)+                          # one or more of all this
$                           # end of line

As you can see, it matches your examples in Expresso:

http://imgur.com/5ctQS.png





相关问题
Uncommon regular expressions [closed]

Recently I discovered two amazing regular expression features: ?: and ?!. I was curious of other neat regex features. So maybe you would like to share some tricky regular expressions.

regex to trap img tag, both versions

I need to remove image tags from text, so both versions of the tag: <img src="" ... ></img> <img src="" ... />

C++, Boost regex, replace value function of matched value?

Specifically, I have an array of strings called val, and want to replace all instances of "%{n}%" in the input with val[n]. More generally, I want the replace value to be a function of the match ...

PowerShell -match operator and multiple groups

I have the following log entry that I am processing in PowerShell I m trying to extract all the activity names and durations using the -match operator but I am only getting one match group back. I m ...

Is it possible to negate a regular expression search?

I m building a lexical analysis engine in c#. For the most part it is done and works quite well. One of the features of my lexer is that it allows any user to input their own regular expressions. This ...

regex for four-digit numbers (or "default")

I need a regex for four-digit numbers separated by comma ("default" can also be a value). Examples: 6755 3452,8767,9865,8766,3454 7678,9876 1234,9867,6876,9865 default Note: "default" ...

热门标签