English 中文(简体)
Uncommon regular expressions [closed]
原标题:
  • 时间:2009-11-13 14:27:44
  •  标签:
  • regex
问题回答

I wouldn t call them secret.

If you re serious in learning regex, the (already mentioned) on-line resource http://www.regular-expressions.info should be in your bookmarks, and Friedl s Mastering Regular Expressions (Third Edition) should be on your bookshelf.

I think the entire regular-expressions.info site is a good, if not so "secret", trick. :) It has those, on the advanced page.

Your discoveries are non-capturing groups (?:...) and negative look-ahead assertions (?!...). There aren t any "secret" regex tricks, but there are many features that you may not know about. I recommend a thorough reading of perlre.

Not a secret regex trick but a good recommendation is the book Regular Expressions Cookbook by O Reilly http://www.amazon.com/dp/0596520689

It helps to test your code before you post it. I ran, in Perl:


if ( "My cat likes green birds" =~ m/My (?!dog) likes .+ / ) {
    print( "Match => "$1"
" );
} else {
    print( "No match
" );
}

and it output No match. On the other hand:


if ( "My cat likes green birds" =~ m/My (?!dog)(.+?) likes .+ / ) {
    print( "Match => "$1"
" );
} else {
    print( "No match
" );
}

outputs Match => "cat".

Try your code sometimes. You ll be amazed at how much a test run clears up your understanding of a topic.

I guess it is all a secret if you never look at the docs installed on your computer along with Perl:

Start with

$ perldoc perlre

There is no need for the rest of us to post bits and pieces of the docs here as answers. Besides, your explanations of both patterns are wrong:

# (?:pattern)
# (?imsx-imsx:pattern)

This is for clustering, not capturing; it groups subexpressions like "()", but doesn t make backreferences as "()" does.

(?!pattern)

A zero-width negative look-ahead assertion. For example /foo(?!bar)/ matches any occurrence of "foo" that isn t followed by "bar". Note however that look-ahead and look-behind are NOT the same thing.

Well it s up to you to decide what s rare. Get a program like RegexBuddy which has dropdownlists from which you can build expressions by specifying different criteria, and see if there s anything in those lists that you haven t heard of before =)

Did you know, say, that you can have named capturing groups? Such as

 (?<Awesome>.*?)

Would actually be fetched with Awesome rather than a zero-based index.

Other than that, I ll add that your second example is negative lookahead. It says that the string that follows must definitely not be dog . So "my dog likes green birds" would not match. But perhaps that s what you meant. I thought that was a bit unclear, from reading your post =)

In vim, this line will remove all XML comments, single or multi line:

:%s/<!--\_.{-}-->//g

The \_. is like a dot that matches newlines too. The {-} is the non-greedy star, like *? in sed.

Not so secret you can test your regexes online at





相关问题
Uncommon regular expressions [closed]

Recently I discovered two amazing regular expression features: ?: and ?!. I was curious of other neat regex features. So maybe you would like to share some tricky regular expressions.

regex to trap img tag, both versions

I need to remove image tags from text, so both versions of the tag: <img src="" ... ></img> <img src="" ... />

C++, Boost regex, replace value function of matched value?

Specifically, I have an array of strings called val, and want to replace all instances of "%{n}%" in the input with val[n]. More generally, I want the replace value to be a function of the match ...

PowerShell -match operator and multiple groups

I have the following log entry that I am processing in PowerShell I m trying to extract all the activity names and durations using the -match operator but I am only getting one match group back. I m ...

Is it possible to negate a regular expression search?

I m building a lexical analysis engine in c#. For the most part it is done and works quite well. One of the features of my lexer is that it allows any user to input their own regular expressions. This ...

regex for four-digit numbers (or "default")

I need a regex for four-digit numbers separated by comma ("default" can also be a value). Examples: 6755 3452,8767,9865,8766,3454 7678,9876 1234,9867,6876,9865 default Note: "default" ...

热门标签