我们的SD源代码搜索引擎(SCSE)可以直接提供此结果。
The SCSE provides a way to search extremely quickly across large sets of files using some of the language structure to enable precise queries and minimize false positives. It handles a wide array
of languages, even at the same time, including Python. A GUI shows search hits and a page of actual text from the file containing a selected hit.
It uses lexical information from the source languages as the basis for queries, comprised of various langauge keywords and pattern tokens that match varying content langauge elements. SCSE knows the types of lexemes available in the langauge. One can search for a generic identifier (using query token I) or an identifier matching some regulatr expression. Similar, on can search for a generic string (using query token "S" for "any kind of string literal") or for a specific
type of string (for Python including "UnicodeStrings", non-unicode strings, etc, which collectively make up the set of Python things comprising "S").
所以搜索:
for ... I=ij*
查找前缀为“ij”的标识符near(“…”)的关键字,并显示所有命中数。(语言特定的空格,包括换行符和注释,将被忽略。
琐碎的搜索:
S
查找所有字符串文字。这通常是一个相当大的集合:-}
搜索
UnicodeStrings
查找在词汇上定义为Unicode字符串的所有字符串文字(u“…”)
您需要的是不是UnicodeStrings的所有字符串。SCSE提供了一个“减法”运算符,用于减去与另一种命中重叠的一种命中。因此,您的问题“哪些字符串不是unicode”简明地表示为:
S-UnicodeStrings
所有显示的点击都将是不是unicode字符串的字符串,这是你的确切问题。
SCSE提供了日志记录功能,以便您可以记录命中次数。您可以从命令行运行SCSE,为您的答案启用脚本查询。将其放入命令脚本将提供一个直接给出答案的工具。