Python 3 has a string method called str.isidentifier
How can I get similar functionality in Python 2.6, short of rewriting my own regex, etc.?
Python 3 has a string method called str.isidentifier
How can I get similar functionality in Python 2.6, short of rewriting my own regex, etc.?
the tokenize module defines a regexp called Name
import re, tokenize, keyword
re.match(tokenize.Name + $ , somestr) and not keyword.iskeyword(somestr)
All of the answers in this thread seem to be repeating a mistake in the validation which allows strings that are not valid identifiers to be matched like ones.
The regex patterns suggested in the other answers are built from tokenize.Name
which holds the following regex pattern [a-zA-Z_]w*
(running python 2.7.15) and the $ regex anchor.
Please refer to the official python 3 description of the identifiers and keywords (which contains a paragraph that is relevant to python 2 as well).
Within the ASCII range (U+0001..U+007F), the valid characters for identifiers are the same as in Python 2.x: the uppercase and lowercase letters A through Z, the underscore _ and, except for the first character, the digits 0 through 9.
thus foo should not be considered as a valid identifier.
While one may argue that this code is functional:
>>> class Foo():
>>> pass
>>> f = Foo()
>>> setattr(f, foo
, bar )
>>> dir(f)
[ __doc__ , __module__ , foo
]
>>> print getattr(f, foo
)
bar
As the newline character is indeed a valid ASCII character, it is not considered to be a letter. Further more, there is clearly no practical use of an identifer that ends with a newline character
>>> f.foo
SyntaxError: unexpected character after line continuation character
The str.isidentifier
function also confirms this is an invalid identifier:
python3 interpreter:
>>> print( foo
.isidentifier())
False
$
anchor vs the
anchorQuoting the official python2 Regular Expression syntax:
$
Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline. foo matches both ‘foo’ and ‘foobar’, while the regular expression foo$ matches only ‘foo’. More interestingly, searching for foo.$ in foo1 foo2 matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode; searching for a single $ in foo will find two (empty) matches: one just before the newline, and one at the end of the string.
This results in a string that ends with a newline to match as a valid identifier:
>>> import tokenize
>>> import re
>>> re.match(tokenize.Name + $ , foo
)
<_sre.SRE_Match at 0x3eac8e0>
>>> print m.group()
foo
The regex pattern should not use the $
anchor but instead is the anchor that should be used.
Quoting once again:
Matches only at the end of the string.
And now the regex is a valid one:
>>> re.match(tokenize.Name + r , foo
) is None
True
See Luke s answer for another example how this kind of weak regex matching could potentially in other circumstances have more dangerous implications.
Python 3 added support for non-ascii identifiers see PEP-3131.
re.match(r [a-z_]w*$ , s, re.I)
should do nicely. As far as I know there isn t any built-in method.
Good answers so far. I d write it like this.
import keyword
import re
def isidentifier(candidate):
"Is the candidate string an identifier in Python 2.x"
is_not_keyword = candidate not in keyword.kwlist
pattern = re.compile(r ^[a-z_][a-z0-9_]*$ , re.I)
matches_pattern = bool(pattern.match(candidate))
return is_not_keyword and matches_pattern
In Python < 3.0 this is quite easy, as you can t have unicode characters in identifiers. That should do the work:
import re
import keyword
def isidentifier(s):
if s in keyword.kwlist:
return False
return re.match(r ^[a-z_][a-z0-9_]*$ , s, re.I) is not None
I ve decided to take another crack at this, since there have been several good suggestions. I ll try to consolidate them. The following can be saved as a Python module and run directly from the command-line. If run, it tests the function, so is provably correct (at least to the extent that the documentation demonstrates the capability).
import keyword
import re
import tokenize
def isidentifier(candidate):
"""
Is the candidate string an identifier in Python 2.x
Return true if candidate is an identifier.
Return false if candidate is a string, but not an identifier.
Raises TypeError when candidate is not a string.
>>> isidentifier( foo )
True
>>> isidentifier( print )
False
>>> isidentifier( Print )
True
>>> isidentifier(u Unicode_type_ok )
True
# unicode symbols are not allowed, though.
>>> isidentifier(u Unicode_content_u00a9 )
False
>>> isidentifier( not )
False
>>> isidentifier( re )
True
>>> isidentifier(object)
Traceback (most recent call last):
...
TypeError: expected string or buffer
"""
# test if candidate is a keyword
is_not_keyword = candidate not in keyword.kwlist
# create a pattern based on tokenize.Name
pattern_text = ^{tokenize.Name}$ .format(**globals())
# compile the pattern
pattern = re.compile(pattern_text)
# test whether the pattern matches
matches_pattern = bool(pattern.match(candidate))
# return true only if the candidate is not a keyword and the pattern matches
return is_not_keyword and matches_pattern
def test():
import unittest
import doctest
suite = unittest.TestSuite()
suite.addTest(doctest.DocTestSuite())
runner = unittest.TextTestRunner()
runner.run(suite)
if __name__ == __main__ :
test()
What I am using:
def is_valid_keyword_arg(k):
"""
Return True if the string k can be used as the name of a valid
Python keyword argument, otherwise return False.
"""
# Don t allow python reserved words as arg names
if k in keyword.kwlist:
return False
return re.match( ^ + tokenize.Name + $ , k) is not None
All solutions proposed so far do not support Unicode or allow a number in the first char if run on Python 3.
Edit: the proposed solutions should only be used on Python 2, and on Python3 isidentifier
should be used. Here is a solution that should work anywhere:
re.match(r ^w+$ , name, re.UNICODE) and not name[0].isdigit()
Basically, it tests whether something consists of (at least 1) characters (including numbers), and then it checks that the first char is not a number.
Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...
I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...
Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...
Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...
I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...
Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...
Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...
I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...