English 中文(简体)
有没有比Levenstein距离“更好”的字符串比较算法?
原标题:Are there any string comparison alogrithms out there that are "better" than Levenshtein Distance?

我一直在用它做我正在做的一个项目,但有些结果不是我会选择的。例如:

将“日期”与

  1. "State" it has a lev distance of 2
  2. "Today s Date" it has a lev distance of 9

当然,这是我们对算法的期望,但我很好奇,是否有人知道有什么东西可以更接近与源字符串(日期)完全匹配的任何比较字符串?这意味着“今天的日期”会有更高的排名,因为它有“日期”。

如果你能找到一个实现这一点的.NET库,那就可以获得额外的积分。

最佳回答

我认为这是为了让你在雇用莱文斯坦之前把这个词标记化。另一种选择是Jaro Winker也距离。

有一个.net库SimMetrics似乎涵盖几个备选方案

问题回答

你可能想找到最长的公共子序列

要做到这一点,您需要一些使用上下文

如果你试图查找地址,那么“Nosuch STREET”可能与“Nosuck ROAD”完美匹配,或者在禁飞名单中,你希望Gadafi的所有20个拼写都匹配。

如果你试图分析一段历史文本在复制过程中发生了多大变化,那么你需要一个不同的算法,





相关问题
How to add/merge several Big O s into one

If I have an algorithm which is comprised of (let s say) three sub-algorithms, all with different O() characteristics, e.g.: algorithm A: O(n) algorithm B: O(log(n)) algorithm C: O(n log(n)) How do ...

Grokking Timsort

There s a (relatively) new sort on the block called Timsort. It s been used as Python s list.sort, and is now going to be the new Array.sort in Java 7. There s some documentation and a tiny Wikipedia ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Enumerating All Minimal Directed Cycles Of A Directed Graph

I have a directed graph and my problem is to enumerate all the minimal (cycles that cannot be constructed as the union of other cycles) directed cycles of this graph. This is different from what the ...

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签