Question

Here is the class on gist https://gist.github.com/2605302

我多次用不同档案对其进行测试,即使对双轨搜索的比较较少,所花费的时间是ALWAYS。什么是错的?

public static int linerSearch ( String array [], String word, long resultsArray [])
{
    int comparisons = 0;
    int pos = -1;
    //i have started the timer where the search actualy starts
    long start = System.nanoTime ();
    for (int i = 0; i < array.length; i++){
        comparisons = comparisons + 1;
        if (array [i].equals (word)){
            pos = i;
            break;
        }
    }
    long stop = System.nanoTime ();
    long total = stop - start;
    resultsArray [0] = total;
    resultsArray [1] = (long) (long) array.length;
    resultsArray [2]= (long) (long) comparisons;
    return pos;
}

这里是下届双阶等级。

public  static int binarySearch (String [] array, String word, resultsArray []) {
    int start = 0;
    int end = array.length - 1;;
    int midPt;
    int pos = -1;
    int comparisons2 = 0;
    long start2 = System.nanoTime ();
    Arrays.sort (array);
    while (start <= end) {
        midPt = (start + end) / 2;
        comparisons2 = comparisons2 + 1;
        if (array [midPt].equalsIgnoreCase (word)) {
            pos = midPt;
            break;
        }
        else if (array [midPt].compareToIgnoreCase (word) < 0) {
            start = midPt + 1;
            comparisons2 = comparisons2 + 1;
            //camparisons2 addition was added inside this elseif and other elseif as a work around for not breaking the elseif statement tree, if it has made it two the last elseif then two camparisons after the first one will have been done
        } else if (array [midPt].compareToIgnoreCase (word) > 0)  {
            comparisons2 = comparisons2 + 2;
            end = midPt - 1;
        }
    }
    long stop2 = System.nanoTime ();
    long total2 = stop2 - start2;
    resultsArray [0] = total2;
    resultsArray [1] = (long) (long) array.length;
    resultsArray [2]= (long) (long) comparisons2;
    return pos;
}

编辑:我还要补充一点,一是在没有这一法典线以前已经分类的阵列上它,而它现在仍然是一个更长的时间。

Answer 1

奥凯,我一劳永逸地为你工作。首先,这里使用的是双轨搜索方法:

public static int binarySearch(String[] array, String word, long resultsArray[]) {
    int start = 0;
    int end = array.length - 1;
    int midPt;
    int pos = -1;
    int comparisons2 = 0;

    //Arrays.sort(array);

    long start2 = System.nanoTime();
    while (start <= end) {
        midPt = (start + end) / 2;
        int comparisonResult = array[midPt].compareToIgnoreCase(word);
        comparisons2++;
        if (comparisonResult == 0) {
            pos = midPt;
            break;
        } else if (comparisonResult < 0) {
            start = midPt + 1;
        } else { // comparisonResult > 0
            end = midPt - 1;
        }
    }
    long stop2 = System.nanoTime();
    long total2 = stop2 - start2;

    resultsArray[0] = total2;
    resultsArray[1] = (long) array.length;
    resultsArray[2] = (long) comparisons2;
    return pos;
}

你注意到,我通过挽救比较结果并利用这一结果减少了比较数目。

其次,我下载了。它对本案已经置之不理。然后,我建立了一种测试方法,把该档案的内容装入一个阵列,然后使用这两种搜索方法查找该清单的每一字。然后,每种方法的比较次数和次数分别计算。

I found out that you must be careful in choosing which comparison methods to use: if you Arrays.sort(...) a list and you use compareToIgnoreCase in binary search, it fails! By failing I mean that it cannot find the word from the given list even though the word actually exists there. That is because Arrays.sort(...) is a case-sensitive sorter for Strings. If you use that, you must use the compareTo(...) method with it.

因此,我们有两起案件。

a case-insensitively sorted list and the use of compareToIgnoreCase
a case-sensitively sorted list and the use of compareTo

除了在双轨搜索中选取这些选择外,您还有线搜索的选择:是否使用<条码>平等<>> >条码/代码>或<条码>等。我对所有这些案件都进行了测试,并对这些案件进行了比较。平均结果:

Linear search with equals: time: 725536 ns; comparisons: 117941; time / comparison: 6.15 ns
Linear search with equalsIgnoreCase: time: 1064334 ns; comparisons: 117940; time / comparison: 9.02 ns
Binary search with compareToIgnoreCase: time: 1619 ns; comparisons: 16; time / comparison: 101.19 ns
Binary search with compareTo: time: 763 ns; comparisons: 16; time / comparison: 47.69 ns

So, now we can clearly see your problem: the compareToIgnoreCase method takes some 16 times as much time as the equals method! Because, on average, it takes the binary search 16 comparisons to find the given word, you can perform 124 linear comparisons in that time. So if you test with word lists shorter than that, the linear search is, indeed, always faster than the binary search due to the different methods they are using.

I actually also counted the number of words that the linear search was able to find faster than the binary search: 164 when using the compareTo method and 614 when using the compareToIgnoreCase method. Of the the list of 235882 words, that s about 0.3 percent. So all in all I think it s still safe to say that the binary search is faster than the linear search. :)

One last point before you ask: I removed the sorting code from the binarySearch method, because that s actually an entirely different thing. Since you are comparing two searching algorithms, it s not fair for the other if you add the cost of a sorting algorithm to its figures. I posted the following as a comment in another answer already, but I ll copy it here for completeness:

实物搜索增加了分类的间接费用。因此,如果你只需要从一个阵列中找到一个要素,则线性搜索总是更快,因为分类至少需要O(nlog n)时间,然后是双轨搜索需要O(log n)时间,以O(nlog n)操作为主。在O(n)时间进行线搜索,这比O(nlog n)好。但是,一旦出现阵列,O(log n)的路要好于O(n)。

如果你坚持在<条码>内进行分级指挥,则<条码> 你们应当知道,在我确定的情况下,最初随机命令中长篇的字句平均需要14万 000多人,即0.14秒。届时,如果你只需要从那里找到一个或两个要素的话,你就可以使用<代码>当量/代码>方法进行约23 000次比较,因此,really不应<<>>>?

还有一点。在这个例子中,在你正在搜索阵列中的文字时,由于计算机的快速主记忆中可以节省该阵列的费用微不足道。但是,如果你说,大量订购档案,你试图从them<><>>>找到一些东西,那么查阅单一档案的费用就会使每笔计算的成本变得微不足道。在这种情形下(too),双轨搜索将完全 rock。

Answer 2

您的基准问题是,Arrays.sort(array)需要大部分时间,而Noyt没有计算出比较。线性搜索需要N的比较。当你打一个阵列时,你花费的钱超过了N的比较。

看到双轨搜索越快,你就应当进行这样的测试:

(1) 寻找不同要素1000次线性搜索

(2) 一次短距离阵列,使用双向搜索1000次

Answer 3

你的基准有缺陷,原因很多:

we don t know the contents of the file. If the searched word is at the beginning, the linear search will be faster than the binary search
the linear search compares with equals, whereas the binary search compares with equalsIgnoreCase
you don t execute the code a sufficient number of times to let the JIT compile the code

如果你的双轨搜索算法是正确的,那么我就没有核实过,但为什么不使用与JDK(java.util)捆绑的一台。阿列斯班。

无论如何,你不必衡量任何情况。双轨搜索平均比线搜索更快。无需再证明这一点。

Answer 4

你的法典衡量了双轨搜索,但也衡量了在进行搜索之前阵列的分类。这总是比简单的线搜查更长。

Answer 5

} else if (array [midPt].compareToIgnoreCase (word) > 0)  {

你们根本不需要这一检验。此时此刻,该法典没有其他可能。它是平等的,低于:你已经测试了这些试验;因此,它必须大于。

因此,您的对比可减少33%。

友情链接