Question

I have a file dict.txt that has all words in the English language.

用户将输入其词:

典型的投入是:最好由强调者而不是(-)来说明身份不明的特性。

我希望该方案能够提出一个清单,列出在字典中发现的所有最佳对应措施。

例: 如果部分字句满,清单将包含跑道、跑步、悬.、ro等等。

是否有办法利用休息时间这样做?

Answer 1

这样做的一个方便办法是使用。由于不清楚这一问题是否是家庭工作,因此细节留待读者。

Answer 2

Instead of using _ to denote wildcards, use w instead. Add to the beginning and end of the pattern, then just run the dictionary through a regexp matcher. So -un--- becomes:

>>> import re
>>> re.findall(r wunwww , "run runner bunt bunter bunted bummer")
[ runner ,  bunter ,  bunted ]

w 对应任何字体。符合任何字边界。

Answer 3

如果你想一再这样做,你就应当制定一个指数:

wordlist = [word.strip() for word in "run, ran, rat, rob, fish, tree".split( , )]

from collections import defaultdict

class Index(object):

    def __init__(self, wordlist=()):
        self.trie = defaultdict(set)
        for word in wordlist:
            self.add_word(word)

    def add_word(self, word):
        """ adds word to the index """
        # save the length of the word
        self.trie[len(word)].add(word)    
        for marker in enumerate(word):
            # add word to the set of words with (pos,char)
            self.trie[marker].add(word)


    def find(self, pattern, wildcard= -  ):
        # get all word with matching length as candidates
        candidates = self.trie[len(pattern)]

        # get all words with all the markers
        for marker in enumerate(pattern):            
            if marker[1] != wildcard:
                candidates &= self.trie[marker]

            # exit early if there are no candicates
            if not candidates:                
                return None

        return candidates


with open( dict.txt ,  rt ) as lines:
    wordlist = [word.strip() for word in lines]

s = Index(wordlist)
print s.find("r--")

Tries 用于搜索探测器。这是使用单一字典的简单定点。

Answer 4

探照算法或某件事等音响,但我给你一个开端。

一种解决办法可能是将档案(如果能够在合理时间内完成)编入一个树木结构,每个特性代表一个节点价值,每个儿童都是随后的特性。然后,你可以把投入作为地图,verse树。性格是接下来的路要走的,而干ash则意味着它应当包括所有的儿童节点。每当你打上一页的深层时,你知道的那段话的长度就等于一页。

很幸运的是,一旦你指数化,你的搜索就会大大加快。指数化可永远采用......

Answer 5

a. 记忆线,但这只是:

import re
import sys

word =  \b  + sys.argv[1].replace( - ,  \w ) +  \b 
print word

with open( data.txt ,  r ) as fh:
    print re.findall(word, fh.read())

Answer 6

对我采取了几种做法;

首先是把你的字句放在“字句”的前面][字句][字句] = 字数(语句);然后,你的问询成为所有相关字数的交汇点。非常快,但记忆密集,许多准备工作。

注

# search for  r-n 
matches = list(words[3][0][ r ] & words[3][2][ n ])

第二种是使用定期表达方式对词典进行直线扫描;记忆足迹缓慢但最小。

注

import re

foundMatch = re.compile( r.n ).match
matches = [word for word in allWords if foundMatch(word)]

第三是对一字检索的检索;

第四,它像你想要的一样,是一字塔:

with open( dictionary.txt ) as inf:
    all_words = [word.strip().lower() for word in inf]  # one word per line

find_word =  r-tt-r 
matching_words = []
for word in all_words:
    if len(word)==len(find_word):
        if all(find==ch or find== -  for find,ch in zip(find_word, word)):
            matching_words.append(word)

<>Edit:第一种选择的全文如下:

from collections import defaultdict
import operator

try:
    inp = raw_input    # Python 2.x
except NameError:
    inp = input        # Python 3.x

class Words(object):
    @classmethod
    def fromFile(cls, fname):
        with open(fname) as inf:
            return cls(inf)

    def __init__(self, words=None):
        super(Words,self).__init__()
        self.words = set()
        self.index = defaultdict(lambda: defaultdict(lambda: defaultdict(set)))
        _addword = self.addWord
        for word in words:
            _addword(word.strip().lower())

    def addWord(self, word):
        self.words.add(word)
        _ind = self.index[len(word)]
        for ind,ch in enumerate(word):
            _ind[ind][ch].add(word)

    def findAll(self, pattern):
        pattern = pattern.strip().lower()
        _ind = self.index[len(pattern)]
        return reduce(operator.__and__, (_ind[ind][ch] for ind,ch in enumerate(pattern) if ch!= - ), self.words)

def main():
    print( Loading dict...  )
    words = Words.fromFile( dict.txt )
    print( done. )

    while True:
        seek = inp( Enter partial word ("-" is wildcard, nothing to exit):  ).strip()
        if seek:
            print("Matching words: "+   .join(words.findAll(seek))+ 
 )
        else:
            break

if __name__=="__main__":
    main()

友情链接