English 中文(简体)
更好地比较字符串方法
原标题:Better compare string method

我的问题是,什么是最快的(质量也很重要,但不太重要)方法来比较两个字符串?

我在找一种最有效的方法来比较两个字符串。 我比较的一些字符串可以超过 5000 个字符。 我比较了一个大约 80 个字符串的列表和另一个大约 200 个字符串的列表。 它需要永远, 即使我正在串线。 我使用 < href=" http://commons.apach. org/ lang/ apache/ commons/ lang3/ StringUtils. html# getLevenshtein distance% 28java. lang. CharSequence,% 20java. lang. harSquence% 29" rel="nofollow"\\\ code>StringUtils. getLevenshtein Distance( string s, Stringt) 方法。 我的方法是这样的。 有更好的方法吗?

private void compareMe() {
  List<String> compareStrings = MainController.getInstance().getCompareStrings();
  for (String compare : compareStrings) {
    int levenshteinDistance = StringUtils.getLevenshteinDistance(me, compare);
    if (bestScore > levenshteinDistance
          && levenshteinDistance > -1) {
      bestScore = levenshteinDistance; //global variable
      bestString = compare; //global variable
    }
  }
}

此处为两个字符串的样本,其中两个字符串的得分应该不错:

字符串 1 :

SELECT 
CORP_VENDOR_NAME as "Corporate Vendor Name",
CORP_VENDOR_REF_ID as "Reference ID",
MERCHANT_ID as "Merchant ID",
VENDOR_CITY as "City",
VENDOR_STATE as "State",
VENDOR_ZIP as "Zip",
VENDOR_COUNTRY as "Country",
REMIT_VENDOR_NAME as "Remit Name",
REMIT_VENDOR_REF_ID as " Remit Reference ID",
VENDOR_PRI_UNSPSC_CODE as "Primary UNSPSC"
FROM DSS_FIN_USER.ACQ_VENDOR_DIM
WHERE VENDOR_REFERENCE_ID in 
(SELECT distinct CORP_VENDOR_REF_ID
FROM DSS_FIN_USER.ACQ_VENDOR_DIM
WHERE CORP_VENDOR_REF_ID =  ${request.corp_vendor_id}; )

字符串 2 :

SELECT 
CORP_VENDOR_NAME as "Corporate Vendor Name",
CORP_VENDOR_REF_ID as "Reference ID",
MERCHANT_ID as "Merchant ID",
VENDOR_CITY as "City",
VENDOR_STATE as "State",
VENDOR_ZIP as "Zip",
VENDOR_COUNTRY as "Country",
REMIT_VENDOR_NAME as "Remit Name",
REMIT_VENDOR_REF_ID as " Remit Reference ID",
VENDOR_PRI_UNSPSC_CODE as "Primary UNSPSC"
FROM DSS_FIN_USER.ACQ_VENDOR_DIM
WHERE VENDOR_REFERENCE_ID in 
(SELECT distinct CORP_VENDOR_REF_ID
FROM DSS_FIN_USER.ACQ_VENDOR_DIM
WHERE CORP_VENDOR_REF_ID =  ACQ-169013 )

您会注意到唯一的区别是字符串结尾处的 ${request.corp_ vendor_id}; 。 这将导致它从 Levenshtein disistance 方法中得分 26

最佳回答

您应该在比较逻辑中考虑可能的捷径, 以避免一些计算。 所以, 如果您想要在全球范围内最小化 Levensthein 距离, 您甚至不需要计算它, 如果字符串大小的差别高于您目前最好的 Levenshtein 距离 。

例如,如果你目前最好的Levenshtein距离是50, 那么可以避免比较两个字符串, 大小分别为100和180, 因为它们的Levenshtein距离至少80。

问题回答

暂无回答




相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...