我的问题是,什么是最快的(质量也很重要,但不太重要)方法来比较两个字符串?
我在找一种最有效的方法来比较两个字符串。 我比较的一些字符串可以超过 5000 个字符。 我比较了一个大约 80 个字符串的列表和另一个大约 200 个字符串的列表。 它需要永远, 即使我正在串线。 我使用 < href=" http://commons.apach. org/ lang/ apache/ commons/ lang3/ StringUtils. html# getLevenshtein distance% 28java. lang. CharSequence,% 20java. lang. harSquence% 29" rel="nofollow"\\\ code>StringUtils. getLevenshtein Distance( string s, Stringt) 方法。 我的方法是这样的。 有更好的方法吗?
private void compareMe() {
List<String> compareStrings = MainController.getInstance().getCompareStrings();
for (String compare : compareStrings) {
int levenshteinDistance = StringUtils.getLevenshteinDistance(me, compare);
if (bestScore > levenshteinDistance
&& levenshteinDistance > -1) {
bestScore = levenshteinDistance; //global variable
bestString = compare; //global variable
}
}
}
此处为两个字符串的样本,其中两个字符串的得分应该不错:
字符串 1 :
SELECT
CORP_VENDOR_NAME as "Corporate Vendor Name",
CORP_VENDOR_REF_ID as "Reference ID",
MERCHANT_ID as "Merchant ID",
VENDOR_CITY as "City",
VENDOR_STATE as "State",
VENDOR_ZIP as "Zip",
VENDOR_COUNTRY as "Country",
REMIT_VENDOR_NAME as "Remit Name",
REMIT_VENDOR_REF_ID as " Remit Reference ID",
VENDOR_PRI_UNSPSC_CODE as "Primary UNSPSC"
FROM DSS_FIN_USER.ACQ_VENDOR_DIM
WHERE VENDOR_REFERENCE_ID in
(SELECT distinct CORP_VENDOR_REF_ID
FROM DSS_FIN_USER.ACQ_VENDOR_DIM
WHERE CORP_VENDOR_REF_ID = ${request.corp_vendor_id}; )
字符串 2 :
SELECT
CORP_VENDOR_NAME as "Corporate Vendor Name",
CORP_VENDOR_REF_ID as "Reference ID",
MERCHANT_ID as "Merchant ID",
VENDOR_CITY as "City",
VENDOR_STATE as "State",
VENDOR_ZIP as "Zip",
VENDOR_COUNTRY as "Country",
REMIT_VENDOR_NAME as "Remit Name",
REMIT_VENDOR_REF_ID as " Remit Reference ID",
VENDOR_PRI_UNSPSC_CODE as "Primary UNSPSC"
FROM DSS_FIN_USER.ACQ_VENDOR_DIM
WHERE VENDOR_REFERENCE_ID in
(SELECT distinct CORP_VENDOR_REF_ID
FROM DSS_FIN_USER.ACQ_VENDOR_DIM
WHERE CORP_VENDOR_REF_ID = ACQ-169013 )
您会注意到唯一的区别是字符串结尾处的 ${request.corp_ vendor_id};
。 这将导致它从 Levenshtein disistance
方法中得分
26
。