在不知道调音的情况下,我首先假设我有一个名称及其频率的清单,然后用该前缀构建一套名称的字典绘图前缀,然后将每组名称转成仅是前5个w.r.t.频率的列表。
使用来自的男孩名字列表,这些男孩的名字来自http://www.ons.gov.uk/ons/publications/re-re-reference-tables.html?edimation=tcm:77-243746" rel=“nofollow”>这里的“nofol”按摩,以创建一个https://docs.google.com/open?id=0Bw2-lCRSFikqdnlhYng4OXR6Zzg" rel=“nofollow” > text file ,其中每条线都是事件的整数频率,有些空格,然后有一个像这样的名字:
8427 OLIVER
7031 JACK
6862 HARRY
5478 ALFIE
5410 CHARLIE
5307 THOMAS
5256 WILLIAM
5217 JOSHUA
4542 GEORGE
4351 JAMES
4330 DANIEL
4308 JACOB
...
以下代码构建了字典:
from collections import defaultdict
MAX_SUGGEST = 5
def gen_autosuggest(name_freq_file_name):
with open(name_freq_file_name) as f:
name2freq = {}
for nf in f:
freq, name = nf.split()
if name not in name2freq:
name2freq[name] = int(freq)
pre2suggest = defaultdict(list)
for name, freq in sorted(name2freq.items(), key=lambda x: -x[1]):
# in decreasing order of popularity
for i, _ in enumerate(name, 1):
prefix = name[:i]
pre2suggest[prefix].append((name, name2freq[name]))
# set max suggestions
return {pre:namefs[:MAX_SUGGEST]
for pre, namefs in pre2suggest.items()}
if __name__ == __main__ :
pre2suggest = gen_autosuggest( 2010boysnames_popularity_engwales2.txt )
如果您给出了前缀, 则会回复您的建议( 连同此情况下的频率, 但如有需要, 可以丢弃 :
>>> len(pre2suggest)
15303
>>> pre2suggest[ OL ]
[( OLIVER , 8427), ( OLLIE , 1130), ( OLLY , 556), ( OLIVIER , 175), ( OLIWIER , 103)]
>>> pre2suggest[ OLI ]
[( OLIVER , 8427), ( OLIVIER , 175), ( OLIWIER , 103), ( OLI , 23), ( OLIVER-JAMES , 16)]
>>>
查看没有尝试 :-)
< 强力 > 时间杀手 < /强 >
如果运行需要很长的时间, 那么您可能会预先计算dict, 并将其保存到文件, 然后在需要使用泡菜模块时装入预计算值 :
>>> import pickle
>>>
>>> savename = pre2suggest.pcl
>>> with open(savename, wb ) as f:
pickle.dump(pre2suggest, f)
>>> # restore it
>>> with open(savename, rb ) as f:
p2s = pickle.load(f)
>>> p2s == pre2suggest
True
>>>