Thursday, July 13, 2017

How Maps work in Java

Maps are the indispensable part of Java, and you can't think of any application without its use.

When it comes to the background of the Maps the first thing that comes to the mind is that a Map is like a dictionary or in other words you can say its a key value pair, where you assign a key with a data, so that you can get the data from the structure just by passing the Key.

Logically the accessing of the data happens in O(1) order but in case if there is a collision(will talk about it soon) it might take 2 or 3 hops but again that is negligible when we talk about memory and high end processors our systems have.

As a coder or a computer enthusiast you should have the basic understanding of how hashing works, but in case if you don't have let me give you the basic idea of the same.

What is hashing?

Hashing, in layman language means getting the data on the basis of the value of the data.

In other words, consider having the data of the students. Now each student is assigned with a roll number, now just consider if for a particular roll number we want to get the details of the students, we have following choices:
  1. We should keep on searching for the roll number sequentially one by one and matching the roll number with the one we are looking for.
  2. Or in normal case(not in prod level or high end apps), if we create an Array of the Student Details and save the student details at the position of the roll number in the array itself.
Now coming to the point (1) this approach is overhead and is not acceptable since for every search this will going to search over whole data. Also if you consider the worst case this approach might end up in O(N) order, since the data you are actually searching would be at last of the storage.

Coming to point (2) suppose you have a total of 10 students whose roll number ranges from 1 to 10 sequentially. And if every student data is stored at the position of the roll number in the Array itself we can just write a simple function to get the data of the student by following:

       function giveMeStudentWithRollNumber(N){ =====> (A)

             //since array starts from 0
            ARRAY_WITH_DATA_OF_STUDENTS[N-1]; ====> (B)

Here in this code, the N-1 is basically a hash function which is giving me the information where the data is residing in my data structure, this data structure for me is Array.

Now  if we refine this code a little bit we can rewrite the same code as follows:

       function giveMeStudentWithRollNumber(N){ =====> (A)

             ARRAY_WITH_DATA_OF_STUDENTS[addressWhereDataIsStoredFor(N)]; ====> (B)

        function addressWhereDataIsStoredFor(N){ ====> (C)
            return N-1;

Now we have made a function named addressWhereDataIsStoredFor that will return me the value of the address where my data is stored in the data structure. This function is termed as the hashFunction and the value it returns is hashValue where the actual data is stored. In our case the hash function is pretty simple, that is  f(x) = X-1; but in reality the hash function does some calculations with prime number to guarantee no hash collision. Now Hash collision is a situation where 2 different data generate the same hashValue, thus resulting in a hash collision.

How hash map works in Java?

Some of the important points of Map in Java:

  1. Maps in Java is an interface available at java.util
  2. There are many classes that implemented the Map and HashMap is one of them which is widely used in many applications. 
  3. By far now you might have guessed it that the Map stores the data against a key which is same as N.
  4. Now for storing the data against a key it also needs a function that will tell the Java where to store the data so that the same can be retrieved back when asked for.
  5. Since the key is meant to be unique, the is of Set type in nature which means that it can not be duplicated.
Here is the basic example of Map with HashMap.

      import java.util.*;
      class MapUse{
            public static void main(String args[]){
                        Map test = new HashMap();
                        test.put(1, "one");
                        test.put(2, "two");
                        System.out.println(test.get(1)); // ===> one
                        test.put(1,"one again");
                        System.out.println(test.get(1)); // ===> one again

Now Map only accepts the Object of that class as a key, which overrides the hashCode() and uses the equals() of the same object to check when the Hash value collided and internally the Linked List is created for the same scenario.

In a nutshell, when an Object of Map is created it internally creates an Array, Node[], where each element is capable of storing the Start of the Linked List. The default size of this Array is 16 and it grows when the limit(based on, if you configured or default) met. In case of Maps the element of the Arrays are termed as Buckets. So basically we would have 16 buckets, where each bucket is capable of keeping a Linked List. 

The element of Node[] is having the following members:

        int hash;
        K key,
        V value;
        Node next;

You might be thinking why the key itself is being stored since we have the value, just return it, but there is a scenario where we have a collision, and in that case we need to return the value related to the Key we sent.

Let me explain you this by example.

Lets suppose my hashCode() function is this:

    function int hashCode(){
        //do the code and return n
        int size = size_of_hashmap;
        return n>(size) ? n%size : n; // =====> Please give attention to this.

This 'n' for some reason  (because of the poor formula of calculation of hashcode) gave the repeated value, i.e hash Collision at that time this key comes to a rescue, since at that time you have a linked list and that linked list nodes will be checked on the basis of equals() with the key, with all the nodes of the linked list.

So for summary:

  1. The Maps is basically an Array of Linked Lists and each element is termed as Bucket.
  2. Each node is having the 4 things: hashCode, key, value, next.
  3. In case of collision the key comes as rescue to return the actual intended result.
  4. The index where the value is stored is basically calculated on the basis of the size of the hash map or in other words size of the array.

No comments:

Post a Comment