English 中文(简体)
How can get Python isidentifer() functionality in Python 2.6?
原标题:

Python 3 has a string method called str.isidentifier

How can I get similar functionality in Python 2.6, short of rewriting my own regex, etc.?

问题回答

the tokenize module defines a regexp called Name

import re, tokenize, keyword
re.match(tokenize.Name +  $ , somestr) and not keyword.iskeyword(somestr)

Invalid Identifier Validation


All of the answers in this thread seem to be repeating a mistake in the validation which allows strings that are not valid identifiers to be matched like ones.

The regex patterns suggested in the other answers are built from tokenize.Name which holds the following regex pattern [a-zA-Z_]w* (running python 2.7.15) and the $ regex anchor.

Please refer to the official python 3 description of the identifiers and keywords (which contains a paragraph that is relevant to python 2 as well).

Within the ASCII range (U+0001..U+007F), the valid characters for identifiers are the same as in Python 2.x: the uppercase and lowercase letters A through Z, the underscore _ and, except for the first character, the digits 0 through 9.

thus foo should not be considered as a valid identifier.

While one may argue that this code is functional:

>>>  class Foo():
>>>     pass
>>> f = Foo()
>>> setattr(f,  foo
 ,  bar )
>>> dir(f)
[ __doc__ ,  __module__ ,  foo
 ]
>>> print getattr(f,  foo
 )
bar

As the newline character is indeed a valid ASCII character, it is not considered to be a letter. Further more, there is clearly no practical use of an identifer that ends with a newline character

>>> f.foo

SyntaxError: unexpected character after line continuation character

The str.isidentifier function also confirms this is an invalid identifier:

python3 interpreter:

>>> print( foo
 .isidentifier())
False

The $ anchor vs the  anchor


Quoting the official python2 Regular Expression syntax:

$

Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline. foo matches both ‘foo’ and ‘foobar’, while the regular expression foo$ matches only ‘foo’. More interestingly, searching for foo.$ in foo1 foo2 matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode; searching for a single $ in foo will find two (empty) matches: one just before the newline, and one at the end of the string.

This results in a string that ends with a newline to match as a valid identifier:

>>> import tokenize
>>> import re
>>> re.match(tokenize.Name +  $ ,  foo
 )
<_sre.SRE_Match at 0x3eac8e0>
>>> print m.group()
 foo 

The regex pattern should not use the $ anchor but instead  is the anchor that should be used. Quoting once again:



Matches only at the end of the string.

And now the regex is a valid one:

>>> re.match(tokenize.Name + r  ,  foo
 ) is None
True

Dangerous Implications


See Luke s answer for another example how this kind of weak regex matching could potentially in other circumstances have more dangerous implications.

Further Reading


Python 3 added support for non-ascii identifiers see PEP-3131.

re.match(r [a-z_]w*$ , s, re.I)

should do nicely. As far as I know there isn t any built-in method.

Good answers so far. I d write it like this.

import keyword
import re

def isidentifier(candidate):
    "Is the candidate string an identifier in Python 2.x"
    is_not_keyword = candidate not in keyword.kwlist
    pattern = re.compile(r ^[a-z_][a-z0-9_]*$ , re.I)
    matches_pattern = bool(pattern.match(candidate))
    return is_not_keyword and matches_pattern

In Python < 3.0 this is quite easy, as you can t have unicode characters in identifiers. That should do the work:

import re
import keyword

def isidentifier(s):
    if s in keyword.kwlist:
        return False
    return re.match(r ^[a-z_][a-z0-9_]*$ , s, re.I) is not None

I ve decided to take another crack at this, since there have been several good suggestions. I ll try to consolidate them. The following can be saved as a Python module and run directly from the command-line. If run, it tests the function, so is provably correct (at least to the extent that the documentation demonstrates the capability).

import keyword
import re
import tokenize

def isidentifier(candidate):
    """
    Is the candidate string an identifier in Python 2.x
    Return true if candidate is an identifier.
    Return false if candidate is a string, but not an identifier.
    Raises TypeError when candidate is not a string.

    >>> isidentifier( foo )
    True

    >>> isidentifier( print )
    False

    >>> isidentifier( Print )
    True

    >>> isidentifier(u Unicode_type_ok )
    True

    # unicode symbols are not allowed, though.
    >>> isidentifier(u Unicode_content_u00a9 )
    False

    >>> isidentifier( not )
    False

    >>> isidentifier( re )
    True

    >>> isidentifier(object)
    Traceback (most recent call last):
    ...
    TypeError: expected string or buffer
    """
    # test if candidate is a keyword
    is_not_keyword = candidate not in keyword.kwlist
    # create a pattern based on tokenize.Name
    pattern_text =  ^{tokenize.Name}$ .format(**globals())
    # compile the pattern
    pattern = re.compile(pattern_text)
    # test whether the pattern matches
    matches_pattern = bool(pattern.match(candidate))
    # return true only if the candidate is not a keyword and the pattern matches
    return is_not_keyword and matches_pattern

def test():
    import unittest
    import doctest
    suite = unittest.TestSuite()
    suite.addTest(doctest.DocTestSuite())
    runner = unittest.TextTestRunner()
    runner.run(suite)

if __name__ ==  __main__ :
    test()

What I am using:

def is_valid_keyword_arg(k):
    """
    Return True if the string k can be used as the name of a valid
    Python keyword argument, otherwise return False.
    """
    # Don t allow python reserved words as arg names
    if k in keyword.kwlist:
        return False
    return re.match( ^  + tokenize.Name +  $ , k) is not None

All solutions proposed so far do not support Unicode or allow a number in the first char if run on Python 3.

Edit: the proposed solutions should only be used on Python 2, and on Python3 isidentifier should be used. Here is a solution that should work anywhere:

re.match(r ^w+$ , name, re.UNICODE) and not name[0].isdigit()

Basically, it tests whether something consists of (at least 1) characters (including numbers), and then it checks that the first char is not a number.





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签