I want to find string similarity between two strings. en.wikipedia has examples of some of them. code.google has a Python implementation of Levenshtein distance.
Is there a better algorithm, (and hopefully a Python library), under these constraints:
- I want to do fuzzy matches between strings. eg matches( Hello, All you people , hello, all You peopl ) should return True
- False negatives are acceptable, False positives, except in extremely rare cases are not.
- This is done in a non realtime setting, so speed is not (much) of concern.
- [Edit] I am comparing multi word strings.
除了Levenshtein距离(或Levenshtein比率)以外,对我的案件来说,其他东西是否是一种更好的算法?