我有数千个同义词的清单。我还有数万个文件,我想搜索这些术语。使用python(或假编码)什么是有效的方法?
# this would work for single word synonyms, but there are multiple word synonyms too
synonymSet = set([...])
wordsInDocument = set([...])
synonymsInDocument = synonymSet.intersection(wordsInDocument)
# this would work, but sounds slow
matches = []
for document in documents:
for synonym in synonymSet:
if synonym in document:
matches.append(synonym)
Is there a good solution to this problem, or will it just take a while? Thank you in advance