Geek In Me!: March 2015

Wednesday, March 18, 2015

Garbage Collection in Java

What is Garbage collection?

Garbage collection is the process by which a program automatically finds and reclaims memory that is no longer used or no longer accessible by the application. This reclamation occurs without programmer assistance. In other words, garbage collection is a process of collecting objects, which have no live strong reference to them. If an object, which is supposed to be collected but still live in memory due to unintentional strong reference then its known as memory leak.

There are various ways garbage collection can be implemented. Two of the methods of garbage collection are 1. Reference Counting and 2. Mark and Sweep algorithm.

Reference Counting: This method involves tracking how many variables reference an object. Initially, there will be only one reference to an object. The reference count will increase if the variable referencing the object is copied. When a variable referencing an object changes its value or goes out of scope, the object’s reference count is decremented. If the reference count of an object becomes zero, the memory associated with that object is freed. This approach is simple and is relatively fast. However, it does not handle circular references. Example of circular references is a circular linked list or for eg: A points to B and B points to A and there is nothing external pointing to them. Both A and B have non-zero reference count but they are not accessible from the application as there is no reference pointing to either of them from outside. Ideally, memory could safely be freed, but the reference-count based garbage collection wont free it.

Mark and Sweep: In the first pass, the memory manager will mark all the objects that can be accessed by any thread in the program. In the second pass, all unmarked objects are de-allocated/swept. This approach handles circular references. This approach is less efficient than the reference counting. The GC runs at different points in the application’s execution and may cause the application to pause while gc is running.

Java objects are created in Heap and Heap is divided in mainly 3 areas:

1 Young Generation: for newly created objects
2 Tenured Generation: for old objects which survived after minor garbage collection.
3 Permanent Generation (Permgen): for class definition, meta-data and string pools.

Young generation is further divided into three parts known as Eden space, Survivor 1 and Survivor 2. When an object first gets created in heap, it gets created in the young generation inside the Eden space. After subsequent minor garbage collection, if the object survives, it gets moved to the survivor1 and then to survivor 2 before major garbage collection moves the object to old or tenured generation. Permanent generation is special and it is used to store string pool and meta-data related to classes and methods in JVM.

Note: Permgen space is removed from Java SE 8 features and instead we have MetaSpace introduced. One of the most dreaded errors in Java “java.lang.OutOfMemoryError: PermGen error” will no longer be seen from Java 8. Nice thing is that MetaSpace default is unlimited and that the system memory itself becomes the memory.

Important points regarding Java Garbage Collection:

1 Garbage collection relieves Java programmer from memory management so the programmer can focus more on the business logic.
2 Garbage collection in Java is done by a daemon thread called Garbage Collector.
3 Being an automatic process, programmers need not worry about calling garbage collection in the code. However, System.gc() and Runtime.gc() are the methods which can be called in code explicitely, if the programmer needs to initiate garbage collection process. Although these methods exists and provide an opportunity for the programmer to start the gc process, but JVM can choose to reject this request. This means that on calling these methods for garbage collection, it is not guaranteed that these calls will do the garbage collection. This decision is taken by the JVM based on the eden space available in the heap memory.
4 Before removing an object from memory, garbage collection thread invokes finalize() method of that object and gives an opportunity to perform any sort of cleanup required.

When does an object become eligible for garbage collection?

1. Any instance that cannot be reached by a live thread.
2. Circularly referenced instances that cannot be reached by any other instances.
3. When all references to an object are explicitly set to null.
4. Local variables/objects created in local scope, after they go out of scope they are eligible for garbage collection.
5. An instance having strong reference is never eligible for garbage collection.
6. For soft references, garbage collection will be done as a last option.
7. Weak and phantom references are eligible for garbage collection.

Types of Garbage Collectors: Java has four types of Garbage collectors.

1. Serial Garbage Collector

2. Parallel Garbage Collector

3. Concurrent Mark and Sweep Garbage Collector

4. G1 Garbage Collector

All of these four types have their own advantages and disadvantages. The programmers can choose the type of garbage collector to be used by the JVM. Passing the choice as JVM argument does this.

Types of Garbage Collector

1. Serial Garbage Collector: As the name suggests, the serial collector uses a single thread to perform the garbage collection. It executes mark-and-sweep algorithm in a single thread. This makes it efficient since there is no communication overhead between the threads. It means that it works by freezing all the application threads while it does its garbage collection job. It is intended for applications with small data sets. The serial GC is selected by default in the Oracle HotSpot JVM or it can be explicitly enabled with the option –XX:+UseSerialGC.

2. Parallel Collector: This one makes use of parallel threads as its name suggests. It is also known as the throughput collector. Because of parallel threads, it can decrease the GC pause time by utilizing the multiple CPUs. It is intended for medium to large sized data sets that are run on multi-processors. Like serial GC this also freezes all the application threads while performing garbage collection. It can be explicitly enabled with the option –XX:+UseParallelGC.

3. Concurrent Mark and Sweep Garbage Collector (CMS): This GC performs most of its work concurrently. i.e while the application is running. The primary motive of this technique is to keep the GC pauses to minimum. It is designed for application with medium to large sized data sets. Multiple threads scan the heap memory to mark instances for eviction and thens weep the marked instances. CMS holds the application in two scenarios: a. while marking the referenced objects in the tenured generation and b. if there is a change in heap memory in parallel while doing garbage collection. It uses more CPU than Parallel GC to provide better throughput. If we can allocate more CPUs then CMS is a preferred choice over parallel collector. It can be explicitly enabled with option XX:+UseConcMarkSweelGC.

4. G1 (Garbage First): This technique splits the heap space into fixed-size regions and tracks the live data in those regions. It keeps a set of pointers that is known as the “remembered set” into and out of the region. When running GC becomes important, it collects the regions with less live data first. So, it is called “garbage first”. Often this means collecting an entire region in one step: if the number of pointers into a region is zero, then it doesn’t need to do a mark or sweep of that region. It can be explicitly enabled with option –XX:UseG1GC.

Thursday, March 5, 2015

How HashMap works

How HashMap works in Java?

HashMap is a data structure that allows us to store object and retrieve it in constant time O(1). HashMap is a key-value pair mapping data structure.

HashMap works on the principle of Hashing. In hashing, hash functions are used to link key and value in HashMap.

What is hash function?

A hash function is a way of assigning a unique code for any variable/object after applying some formula/algorithm on its attributes/properties. The hashCode() method defined in the Object class which returns an integer value is a Hash function.

Read previous post on hashCode() method here. In brief, a hash function should return the same hash code each and every time when the function is called for same/equal objects.

In a hashMap, objects are retrieved using get(key) method and objects are stored using put(key, value) method. When we call put(k,v) method, hashCode() method of key ‘k’ is called which returns an integer value. This integer value is the index of an internal array that is known as a table. This is where the key,value mapping (k,v) is stored inside a hashmap. HashMap internally stores mapping in the form of Map.Entry object which contains both key and value object. To retrieve the object, we call get(k) method. Again, the key objects’ hashCode() method is called which generates the same hash code(an integer value) and we look up at the same index in the internal array to retrieve our information for key k. If there is only one value stored at that index, we return that value. If multiple values are stored at that index, it gets little different to handle such situations, which we call Collisions. Since the internal array (table) in the hashMap is of fixed size and if we keep storing objects, at some point the hash function will return the same index location for two different keys. This scenario is called Collision in HashMap. It is for this reason that we store key and value both in the Map.Entry object because there can be collisions.

To handle collisions, at every index of the table we store a linked list of Entry objects. If we try to retrieve objects from this linked list, we need an extra check to search for the exact key from the linked list. This is done by equals() method.

For this reason, it is mandatory that HashMap keys are immutable for eg: Strings

Since each node contains an entry, HashMap keeps comparing entry’s key object with the passed key using equals() and when it returns true, Map returns the corresponding value.

Complexity of HashMap:

In the worst case, because of hash collisions, a hashMap can reduce to a linked list. This makes look up to be O(n). This issue is addressed in Java 8. The linked list implementation is replaced by a tree implementation which makes the lookup for hashMap in worst case in Java 8 as O(logn). Whereas with a strong hashCode() method implementation which means minimal collisions, the look up time for a HashMap is O(1).

Following is the code snippet for the Entry class which is a linked-list node and a hashMap is actually an array of such nodes.

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

/**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */
    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

Following is the code snippet of put() method of hashMap from Java 7 Documentation

/**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key);
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

Following is the code snippet of get() method of hashMap from Java 7 Documentation

/**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }

Wednesday, March 4, 2015

Static code Analyzers (FindBugs)

Recently, I have heard a lot of buzz about the static code analyzers. There are a bunch of static code analyzers out there and I decided to get started on one of such analyzer for Java. You can find the list of static java code analyzers here.

FindBugs?

FindBugs is an open source tool for static analysis of Java programs. i.e It can find bugs simply by inspecting program’s code. It scans java byte code rather than source code and identifies defects/suspicious code in Java programs. Findbugs needs the compiled class files. But, there is no need to execute the code for the analysis. It is based on the concept of bug patterns. A bug pattern is a code idiom that is often an error.

The potential bugs are classified in four ranks:

1. Scariest

2. Scary

3. Troubling

4. Of concern

Each of these categories is further classified into High Confidence and Normal Confidence. The findings from FindBugs are reported as warnings that is “potential bugs” but not all warnings are necessarily errors/bugs/defects. It is a hint to the developers about their possible impact/severity. According to the developers of findBugs, the rate of false warnings reported by FindBugs is less than 50%. FindBugs is written in Java and can be run with any virtual machine compatible with Sun’s JDK1.5. It uses BCEL (Byte Code Engineering Library by Apache Commons) to analyze Java code.

FindBugs supports plugin architecture and there are plugins available for Eclipse, NetBeans, IntelliJ, Hudson, Jenkins etc.

Here are a few screenshots of what kind of bugs were caught by FindBugs in Java in my environment. (IDE: Eclipse Kepler; Java version: Java 7)

Click the screenshots for a better view

Categorizing the bugs in the 4 categories

Possible NPE

Comparison of objects using == instead of equals()

Using Thread.sleep() with lock held

Unused local variable

If you notice the screen shots carefully, the bug info section gives a detailed explanation of why the tool smells a bug at the line that it indicates has a bug-pattern. The bug info contains useful information for the developer to understand the reason of suspecting it as a bug and deciding if it is really a bug or what the developer intended to do but is bad practice. It also gives the rank, confidence, Pattern, Type, and Category of bug pattern.

There are around 400 bug patterns reported by findBugs in nine different categories:

1. Correctness

2. Bad Practice

3. Dodgy Code

4. Multithreaded Correctness

5. Performance

6. Malicious Code Vulnerability

7. Security

8. Experimental

9. Internationalization

It is very important to bear in mind that not all issues a static analyzer finds are actually bugs. There are some coding preferences, which are a matter of style that developers adopt even though they are not the most efficient. In case a developer doesn’t agree to a warning by findBugs, the solution to this is to suppress warnings using annotations. FindBugs supports several annotations to express the developer’s intent so that FindBugs can issue warnings more appropriately. Visit this page for details on annotations.

More information about this is available at:

http://www.methodsandtools.com/tools/findbugs.php

http://findbugs.sourceforge.net/bugDescriptions.html

Sunday, March 1, 2015

hashCode() and equals() method in Java

Work in progress:

hashCode() and equals() methods are defined in Object class and so all java objects inherit a default implementation of these two methods.

equals() simply checks for equality of two objects. The default implementation just checks for the object references of two objects to verify their equality.

hashCode() returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by implementation of Map interface. The general contract of hashCode() is

Calling hashCode() on the same method multiple times should return the same hashCode value each and every single time. (provided no information used in equals comparisons on the object is modified.) This integer need not be consistent across different executions of the application.
If two objects are equal as per the equals() method, then both of them should return the same integer value.
It is not necessary that two unequal objects as per equals() method will produce two distinct integer results. (However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.)