Fast Algorithm To Find Unique Items in JavaScript Array

September 6, 2009 · 6 comments   1,306 views

in Technical Publications

When I had the requirement to remove duplicate items from a very large array, I found out that the classic method to be not optimized as it took a pretty long time than desired. So, I devised this new algorithm that can sort a large array in a fraction of the original time.

The fastest method to find unique items in array

This method is kind of cheeky in its implementation. It uses the JavaScript’s object to add every item in the array as key. As we all know, objects accepts only unique keys and sure we did capitalize on that.

  1. Array.prototype.unique = function() {
  2.     var o = {}, i, l = this.length, r = [];
  3.     for(i=0; i<l;i++) o[this[i]] = this[i];
  4.     for(i in o) r.push(o[i]);
  5.     return r;
  6. };

Some Thoughts On This Algorithm

This is somewhat classified as “Hash Sieving” method and can also be related to a somewhat modified “Hash Sorting Algorithm” where every item in the array is a hash value and a hash function inserts item into a bucket, replacing existing values in case of hash collision. As such, this can be applied to any programming language for faster sieving of very large arrays.

This algorithm has a linear time complexity of O(2n) in worst case scenario. This is way better than what we will observe for the classic method as described below.

About the classic method

The classic (and most popular) method of finding unique items in an array runs two loops in a nested order to compare each element with rest of the elements. Consequently, the time complexity of the classic method to find the unique items in an array is around quadratic O(n²).

This is not a good thing when you have to find unique items within array of 10,000 items.

  1. Array.prototype.unique = function() {
  2.     var a = [], l = this.length;
  3.     for(var i=0; i<l; i++) {
  4.         for(var j=i+1; j<l; j++)
  5.             if (this[i] === this[j]) j = ++i;
  6.         a.push(this[i]);
  7.     }
  8.     return a;
  9. };

Comparing the above two algorithms

Test Data: An array of elements having N random integers.

Sample (N) Average Case Best Case
Classic New Classic New
50 0.43 0.25 0.01 0.02
100 0.60 0.30 0.09 0.16
500 9.57 0.87 0.1 0.2
1000 24.44 1.51 0.21 0.31
5000 584.28 7.74 0.4 1.0
10000 2360.90 15.03 0.7 1.8


Conclusion

This method of finding unique items within an array seems to be particularly useful for large arrays that are tending towards the real-life situations. When there are more items in an array that are similar, there is not much of a difference in performance and in fact, the classic algorithm scores better by a small margin. However, as the array gets more random, the runtime of the classic algorithm increases manifold.

Related posts:

  1. String Reversing Algorithm Performance In JavaScript
  2. Transpose An Array In JavaScript and jQuery
  3. Convert FusionCharts Data-XML To JavaScript Array
  4. JavaScript Optimization – Destructive Vs Indexed Array Iteration
  5. Optimizing JavaScript (part 1)

{ 6 comments… read them below or add one }

1 Andy L September 6, 2009 at 17:53

You work wonders at times. This is such a simple trick and yet so effective. Perhaps no one thinks about performance and perfection of algorithms as much as you do!
By the way, in line 3 of your Hash Seiving algorithm, why did you do o[this[i]] = this[i];?

Reply

2 Shamasis Bhattacharya September 6, 2009 at 18:03

Do not flatter me that much! :P

o[this[i]] = this[i]; preserves the data-type of the items within the JavScript array. This is because JavaScript object keys are always string and we would not want to needlessly convert a numeric array to string array! By the way, if you are not bothered about the data-type of the unique array, then you can use a modified version of the algorithm that always returns string data-type and is faster due to lesser overhead.

Array.prototype.strUnique = function() {
var o = {}, i, l = this.length, r = [];
for(i=0; i<l;i++) o[this[i]] = null;
for(i in o) r.push(i);
return r;
};

Reply

3 JavascriptBank September 8, 2009 at 06:45

Very cool & good tip, thank you very much for sharing.
Can I share this snippet on my http://www.javascriptbank.com/

Awaiting your response. Thanks

Reply

4 Shamasis Bhattacharya September 10, 2009 at 00:48

Sure. Sharing is caring! Care back for me by retaining my link and attribution. :)

Reply

5 Kevin N November 10, 2009 at 04:06

I’m trying to use this (the modified string data-type only function) inside an embedded js tool (i believe it uses rhino) and I’m having difficulty.  Instead of removing the duplicates  I want to append  1, 2, 3…n at the end of the duplicative strings  (space then integer so Kevin,Kevin becomes  Kevin,Kevin 1). I’m new to js in general and not sure i’m creating the array correctly – I may ask some stupid questions.
var urlarray = new Array(URLName.getString());
should that work? – as I understand it from there I can call this function using the array?

Reply

6 Nilton November 16, 2009 at 03:11

reverse for loops are faster for spidermonkey.
As for Kevin’s question:
Array.prototype.toUnique = function() {
var o = {}, i, l = this.length, r = []; n = []; modified=0;
for(i=this.length-1; i>=0;–i){
if(n[this[i]]){
modified=1;
o[this[i]+” “+ n[this[i]]] = this[i]+ n[this[i]]++;
}else{
o[this[i]] = this[i];n[this[i]]=1
}
}
if(!modified)return this;
for(i in o) r.push(o[i]);
return r;
};

Reply

Leave a Comment