Wednesday, March 18, 2015

Garbage Collection in Java

What is Garbage collection?
Garbage collection is the process by which a program automatically finds and reclaims memory that is no longer used or no longer accessible by the application.  This reclamation occurs without programmer assistance. In other words, garbage collection is a process of collecting objects, which have no live strong reference to them.  If an object, which is supposed to be collected but still live in memory due to unintentional strong reference then its known as memory leak.

There are various ways garbage collection can be implemented. Two of the methods of garbage collection are 1. Reference Counting and 2. Mark and Sweep algorithm.

Reference Counting: This method involves tracking how many variables reference an object. Initially, there will be only one reference to an object. The reference count will increase if the variable referencing the object is copied. When a variable referencing an object changes its value or goes out of scope, the object’s reference count is decremented. If the reference count of an object becomes zero, the memory associated with that object is freed. This approach is simple and is relatively fast. However, it does not handle circular references. Example of circular references is a circular linked list or for eg: A points to B and B points to A and there is nothing external pointing to them. Both A and B have non-zero reference count but they are not accessible from the application as there is no reference pointing to either of them from outside. Ideally, memory could safely be freed, but the reference-count based garbage collection wont free it.

Mark and Sweep: In the first pass, the memory manager will mark all the objects that can be accessed by any thread in the program. In the second pass, all unmarked objects are de-allocated/swept. This approach handles circular references. This approach is less efficient than the reference counting. The GC runs at different points in the application’s execution and may cause the application to pause while gc is running.

Java objects are created in Heap and Heap is divided in mainly 3 areas:
  1. Young Generation: for newly created objects
  2. Tenured Generation: for old objects which survived after minor garbage collection.
  3. Permanent Generation (Permgen): for class definition, meta-data and string pools.



Young generation is further divided into three parts known as Eden space, Survivor 1 and Survivor 2. When an object first gets created in heap, it gets created in the young generation inside the Eden space. After subsequent minor garbage collection, if the object survives, it gets moved to the survivor1 and then to survivor 2 before major garbage collection moves the object to old or tenured generation.  Permanent generation is special and it is used to store string pool and meta-data related to classes and methods in JVM. 
Note: Permgen space is removed from Java SE 8 features and instead we have MetaSpace introduced. One of the most dreaded errors in Java “java.lang.OutOfMemoryError: PermGen error” will no longer be seen from Java 8. Nice thing is that MetaSpace default is unlimited and that the system memory itself becomes the memory.

Important points regarding Java Garbage Collection:
  •    Garbage collection relieves Java programmer from memory management so the programmer can  focus more on the business logic.
  • 2     Garbage collection in Java is done by a daemon thread called Garbage Collector.
  •    Being an automatic process, programmers need not worry about calling garbage collection in the code. However, System.gc() and Runtime.gc() are the methods which can be called in code explicitely, if the programmer needs to initiate garbage collection process.  Although these methods exists and provide an opportunity for the programmer to start the gc process, but JVM can choose to reject this request. This means that on calling these methods for garbage collection, it is not guaranteed that these calls will do the garbage collection. This decision is taken by the JVM based on the eden space available in the heap memory.
  • 4    Before removing an object from memory, garbage collection thread invokes finalize() method of that object and gives an opportunity to perform any sort of cleanup required.
When does an object become eligible for garbage collection?
  • 1.     Any instance that cannot be reached by a live thread.
  • 2.     Circularly referenced instances that cannot be reached by any other instances.
  • 3.     When all references to an object are explicitly set to null.
  • 4.     Local variables/objects created in local scope, after they go out of scope they are eligible for garbage collection.
  • 5.     An instance having strong reference is never eligible for garbage collection.
  • 6.     For soft references, garbage collection will be done as a last option.
  • 7.     Weak and phantom references are eligible for garbage collection.
Types of Garbage Collectors: Java has four types of Garbage collectors.
1. Serial Garbage Collector
2. Parallel Garbage Collector
3. Concurrent Mark and Sweep Garbage Collector
4. G1 Garbage Collector

All of these four types have their own advantages and disadvantages. The programmers can choose the type of garbage collector to be used by the JVM. Passing the choice as JVM argument does this.

Types of Garbage Collector


1. Serial Garbage Collector: As the name suggests, the serial collector uses a single thread to perform the garbage collection.  It executes mark-and-sweep algorithm in a single thread. This makes it efficient since there is no communication overhead between the threads. It means that it works by freezing all the application threads while it does its garbage collection job. It is intended for applications with small data sets. The serial GC is selected by default in the Oracle HotSpot JVM or it can be explicitly enabled with the option –XX:+UseSerialGC.

2. Parallel Collector: This one makes use of parallel threads as its name suggests. It is also known as the throughput collector. Because of parallel threads, it can decrease the GC pause time by utilizing the multiple CPUs. It is intended for medium to large sized data sets that are run on multi-processors. Like serial GC this also freezes all the application threads while performing garbage collection. It can be explicitly enabled with the option –XX:+UseParallelGC.

3. Concurrent Mark and Sweep Garbage Collector (CMS): This GC performs most of its work concurrently. i.e while the application is running. The primary motive of this technique is to keep the GC pauses to minimum. It is designed for application with medium to large sized data sets.  Multiple threads scan the heap memory to mark instances for eviction and thens weep the marked instances. CMS holds the application in two scenarios: a. while marking the referenced objects in the tenured generation and b. if there is a change in heap memory in parallel while doing garbage collection. It uses more CPU than Parallel GC to provide better throughput. If we can allocate more CPUs then CMS is a preferred choice over parallel collector. It can be explicitly enabled with option XX:+UseConcMarkSweelGC.

4. G1 (Garbage First): This technique splits the heap space into fixed-size regions and tracks the live data in those regions. It keeps a set of pointers that is known as the “remembered set” into and out of the region. When running GC becomes important, it collects the regions with less live data first. So, it is called  “garbage first”. Often this means collecting an entire region in one step: if the number of pointers into a region is zero, then it doesn’t need to do a mark or sweep of that region.  It can be explicitly enabled with option –XX:UseG1GC.

No comments:

Post a Comment