Hash Function in Javascript

Writing a Hash Function in JavaScript

So when I was writing my image preloader I had the idea where I wanted to store an image object in an array to easily look up.  Now the key would be the url, but URLs are big and can be ugly, so I really wanted to use a hashmap.  But javascript does not have a native implementation.  In addition, why should it, its arrays are associated and I could very well just use the url as the key.

Anyway, although it probably made more sense to use the URL as the key I wanted to use a hash for lookup.  Thought it would be fun.

So I did a quick google and found this http://werxltd.com/wp/2010/05/13/javascript-implementation-of-javas-string-hashcode-method.

The Algorithm

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

Straight Forward Implementation

function hashCode(str){
	len = str.length;

	hash = 0;
	for(i=1; i<=len; i++){
		char = str.charCodeAt((i-1));
		hash += char*Math.pow(31,(len-i));
		hash = hash & hash; //javascript limitation to force to 32 bits
	}

	return hash;
}

Now this is very straight forward, but isnt the implementation they gave in the link above.  That implementation is actually rather clever and avoids using an expensive multiplication and power function.

The Clever Approach

function hashCode2(str){
	var hash = 0;
	if (str.length == 0) return hash;
	for (i = 0; i < str.length; i++) {
		char = str.charCodeAt(i);		hash = ((hash<<5)-hash)+char;		hash = hash & hash; // Convert to 32bit integer
	}
	return hash;
}

The Break Down

So I was a little confused and wanted to understand how these can be the same, so I dusted off my binary math hat and stepped through it.

char = str.charCodeAt(i)

Just gets the ansii value or utf-8 character code value.  So for example if str = “hello”, then str.charCodeAt(0) would return 104.  Check out this character chart.

hash = ((hash<<5)-hash)+char;

This is the nitty gritty scary part right.  Its actually not that bad and make sense once we step through it.

hash<<5

This is a simple bit shift operator of 5 to the left, but lets review bit shifts of the value 0001.

0001 is 1 in base 10
0010 is a bit shift left 1 and is 2 in base 10

var x = 1; //0001 in binary

Binary Value Bit Shifts Left Code Example Base 10 Value Algebra Equivalent Exponential
0001 0 x<<0 1 x * 1 x * 2 0
0010 1 x<<1 2 x * 2 x * 2 1
0100 2 x<<2 4 x * 2 * 2 x * 2 2
1000 3 x<<3 8 x * 2 * 2 * 2 x * 2 3

As you can see a bit shift is the same as multiplying by 2 to some power.  So lets look at what we can rewrite things to.

hash<<5 or
hash * 2^5 or
hash * 32

So we can rewrite

hash = ((hash<<5)-hash)+char; as

hash = ((hash * 32) - hash) + char; or

hash = hash * (1 * 32 - 1) + char; or

hash = hash * 31 + char;

Whew we figured out where the 31 came from.  Originally I thought it had to do with a prime number or something, but I was way off.

But what about the exponential?

To figure this out we need to step through the for loop

for (i = 0; i < str.length; i++) {
		char = str.charCodeAt(i);
		hash = ((hash<<5)-hash)+char;
		hash = hash & hash; // Convert to 32bit integer
	}

1: hash = hash*31 + char1
2: hash = (hash * 31 + char1) * 31 + char2
3: hash = ((hash * 31 + char1) * 31 + char2) * 31 + char3
etc….

What I did above was replace hash with the value from the previous iteration to show the expansion of the for loop.

So at iteration 3 of the for loop we have…

hash = ((hash * 31 + char1) * 31 + char2) * 31 + char3

Lets do some algebra on this…

hash = ((hash * 31 * 31 + char1 * 31) + char2) * 31 + char3

hash = ((hash * 31 * 31 * 31 + char1 * 31 * 31 + char2 * 31 + char3

or…

hash = hash * 31^3 + char1 * 31^2 + char2 * 31^1 + char3 * 31^0

which gives us what we were looking for…where hash = 0  to start

hash = 0 * 31^3 + char1 * 31^2 + char2 * 31^1 + char3 * 31^0

which gives….

hash = char1 * 31^2 + char2 * 31^1 + char3 * 31^0

hash = s[0] * 31 ^ (n-1) + s[1] * 31^(n-2) + s[2] * 31^(n-3)

There we go.  We have proven it works!

Comments (2)

  1. 9:33 pm, March 16, 2013Wes Widner  / Reply

    Wow, thanks for writing that out!

    I used this initially to interact with another developer’s code who used it to hash the key for data stored in HBase. For your particular application you likely won’t need to worry about it but its worth noting that the 32 bit restriction means this hash algorithm is vulnerable to producing collisions with a probability of 50% after only 77,000 hashes.

    • 9:39 pm, March 16, 2013willsmelser  / Reply

      I have not looked into how good the actual hash algorithm is, but this seems about right. Its quick and cheap though. I originally was looking for something to hash URLs for a UI. So I hope I am not looking at using unique hashes for 10,000 URLs let alone 77,000.

Leave a Reply

Allowed Tags - You may use these HTML tags and attributes in your comment.

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Pingbacks (1)

  1. 9:32 pm, March 16, 2013Favicon of mediocredeveloper.comCreating an Image Preloader