Thursday, March 5, 2015

How HashMap works

How HashMap works in Java?
HashMap is a data structure that allows us to store object and retrieve it in constant time O(1). HashMap is a key-value pair mapping data structure.

HashMap works on the principle of Hashing. In hashing, hash functions are used to link key and value in HashMap.

What is hash function?
A hash function is a way of assigning a unique code for any variable/object after applying some formula/algorithm on its attributes/properties. The hashCode() method defined in the Object class which returns an integer value is a Hash function.
Read previous post on hashCode() method here. In brief, a hash function should return the same hash code each and every time when the function is called for same/equal objects.

In a hashMap, objects are retrieved using get(key) method and objects are stored using put(key, value) method. When we call put(k,v) method, hashCode() method of key ‘k’ is called which returns an integer value. This integer value is the index of an internal array that is known as a table. This is where the key,value mapping (k,v) is stored inside a hashmap.  HashMap internally stores mapping in the form of Map.Entry object which contains both key and value object.  To retrieve the object, we call get(k) method. Again, the key objects’ hashCode() method is called which generates the same hash code(an integer value) and we look up at the same index in the internal array to retrieve our information for key k. If there is only one value stored at that index, we return that value. If multiple values are stored at that index, it gets little different to handle such situations, which we call Collisions.  Since the internal array (table) in the hashMap is of fixed size and if we keep storing objects, at some point the hash function will return the same index location for two different keys. This scenario is called Collision in HashMap. It is for this reason that we store key and value both in the Map.Entry object because there can be collisions.

To handle collisions, at every index of the table we store a linked list of Entry objects. If we try to retrieve objects from this linked list, we need an extra check to search for the exact key from the linked list. This is done by equals() method.
For this reason, it is mandatory that HashMap keys are immutable for eg: Strings
Since each node contains an entry, HashMap keeps comparing entry’s key object with the passed key using equals() and when it returns true, Map returns the corresponding value.

Complexity of HashMap:

In the worst case, because of hash collisions, a hashMap can reduce to a linked list. This makes look up to be O(n). This issue is addressed in Java 8. The linked list implementation is replaced by a tree implementation which makes the lookup for hashMap in worst case in Java 8 as O(logn). Whereas with a strong hashCode() method implementation which means minimal collisions, the look up time for a HashMap is O(1).


Following is the code snippet for the Entry class which is a linked-list node and a hashMap is actually an array of such nodes. 

1
2
3
4
5
static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;



1
2
3
4
/**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */
    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

Following is the code snippet of put() method of hashMap from Java 7 Documentation 

 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

/**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

Following is the code snippet of get() method of hashMap from Java 7 Documentation 
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24



/**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }

No comments:

Post a Comment