English 中文(简体)
unique() for arrays in javascript [duplicate]
原标题:

As everybody knows there s no built-in function to remove the duplicates from an array in javascript. I ve noticed this is also lacking in jQuery (which has a unique function for DOM selections only), and the most common snippet I found checks the entire array and a subset of it for each element (not very efficient I think), like:

for (var i = 0; i < arr.length; i++)
    for (var j = i + 1; j < arr.length; j++)
        if (arr[i] === arr[j])
            //whatever

so I made my own:

function unique (arr) {
    var hash = {}, result = [];
    for (var i = 0; i < arr.length; i++)
        if (!(arr[i] in hash)) { //it works with objects! in FF, at least
            hash[arr[i]] = true;
            result.push(arr[i]);
        }
    return result;
}

I wonder if there s any other algorithm accepted as the best for this case (or if you see any obvious flaw that could be fixed), or, what do you do when you need this in javascript (I m aware that jQuery is not the only framework and some others may have this already covered).

最佳回答

Using the object literal is exactly what I would do. A lot of people miss this technique a lot of the time, opting instead for typical array walks as the original code that you showed. The only optimization would be to avoid the arr.length lookup each time. Other than that, O(n) is about as good as you get for uniqueness and is much better than the original O(n^2) example.

function unique(arr) {
    var hash = {}, result = [];
    for ( var i = 0, l = arr.length; i < l; ++i ) {
        if ( !hash.hasOwnProperty(arr[i]) ) { //it works with objects! in FF, at least
            hash[ arr[i] ] = true;
            result.push(arr[i]);
        }
    }
    return result;
}

// * Edited to use hasOwnProperty per comments

Time complexities to summarize

  f()    | unsorted | sorted | objects | scalar | library
____________________________________________________________
unique   |   O(n)   |  O(n)  |   no    |  yes   |    n/a
original |  O(n^2)  | O(n^2) |   yes   |  yes   |    n/a
uniq     |  O(n^2)  |  O(n)  |   yes   |  yes   | Prototype
_.uniq   |  O(n^2)  |  O(n)  |   yes   |  yes   | Underscore

As with most algorithms, there are trade offs. If you are only sorting scalar values, you re modifications to the original algorithm give the most optimal solution. However, if you need to sort non-scalar values, then using or mimicking the uniq method of either of the libraries discussed would be your best choice.

问题回答

I think your version won t work when you ll have objects or function in the array that give string representation like [Object object]. Because you can only have strings as keys in objects (in the "hash" object here). You ll need to loop into the result array to find if the new entry already exists. It will still be faster than the first method.

Prototype JS has a "uniq" method, you may get inspiration from it.

fun with fun (ctional)

function uniqueNum(arr) {
    return Object.keys(arr.reduce(
        function(o, x) {o[x]=1; return o;}, {})).map(Number);
}  

I m not an algorithm expert by any means, but I ve been keeping an eye on underscore.js. They have this as a function called uniq:

http://documentcloud.github.com/underscore/#uniq

I looked at the code in their library, and copied it here for reference (not my code, this code belongs to underscore.js):

// Produce a duplicate-free version of the array. If the array has already
// been sorted, you have the option of using a faster algorithm.
_.uniq = function(array, isSorted) {
    return _.reduce(array, [], function(memo, el, i) {
        if (0 == i || (isSorted === true ? _.last(memo) != el : !_.include(memo, el))) memo.push(el);
        return memo;
    });
};

EDIT: You need to walk through the rest of the underscore.js code, and I almost took this code out because of it. I left the code snippet in just in case this was still useful.

Unfortunately JS objects have no identity accessible from the language - as other posters have mentioned, using objects as keys in a dictionary will fail when different objects have equal string representations and there is no id() function in the language.

There is a way to avoid the O(n^2) all-pairs check for === identity if you can modify the objects. Pick a random string, walk the array once to check that no object has a property by that name, then just do arr[i][randomPropertyName]=1 for each i. If the next object in the array already has that property, then it is a duplicate.

Unfortunately, the above will only work for modifiable objects. It fails for array values that don t allow property setting (e.g. integers, 42[ random ]=1 just doesn t work :( )





相关问题
How to add/merge several Big O s into one

If I have an algorithm which is comprised of (let s say) three sub-algorithms, all with different O() characteristics, e.g.: algorithm A: O(n) algorithm B: O(log(n)) algorithm C: O(n log(n)) How do ...

Grokking Timsort

There s a (relatively) new sort on the block called Timsort. It s been used as Python s list.sort, and is now going to be the new Array.sort in Java 7. There s some documentation and a tiny Wikipedia ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Enumerating All Minimal Directed Cycles Of A Directed Graph

I have a directed graph and my problem is to enumerate all the minimal (cycles that cannot be constructed as the union of other cycles) directed cycles of this graph. This is different from what the ...

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签