What
is Garbage collection?
Garbage
collection is the process by which a program automatically finds and reclaims
memory that is no longer used or no longer accessible by the application. This reclamation occurs without programmer
assistance. In other words, garbage collection is a process of collecting
objects, which have no live strong reference to them. If an object, which is supposed to be
collected but still live in memory due to unintentional strong reference then
its known as memory leak.
There
are various ways garbage collection can be implemented. Two of the methods of garbage
collection are 1. Reference Counting and 2. Mark and Sweep algorithm.
Reference Counting: This method
involves tracking how many variables reference an object. Initially, there will
be only one reference to an object. The reference count will increase if the
variable referencing the object is copied. When a variable referencing an
object changes its value or goes out of scope, the object’s reference count is
decremented. If the reference count of an object becomes zero, the memory
associated with that object is freed. This approach is simple and is relatively
fast. However, it does not handle circular references. Example of circular
references is a circular linked list or for eg: A points to B and B points to A
and there is nothing external pointing to them. Both A and B have non-zero
reference count but they are not accessible from the application as there is no
reference pointing to either of them from outside. Ideally, memory could safely
be freed, but the reference-count based garbage collection wont free it.
Mark and Sweep: In the first pass, the
memory manager will mark all the objects that can be accessed by any thread in
the program. In the second pass, all unmarked objects are de-allocated/swept.
This approach handles circular references. This approach is less efficient than
the reference counting. The GC runs at different points in the application’s
execution and may cause the application to pause while gc is running.
Java
objects are created in Heap and Heap is divided in mainly 3 areas:
- 1 Young Generation: for newly created objects
- 2 Tenured Generation: for old objects which survived after minor garbage collection.
- 3 Permanent Generation (Permgen): for class definition, meta-data and string pools.
Young
generation is further divided into three parts known as Eden space, Survivor 1
and Survivor 2. When an object first gets created in heap, it gets created in
the young generation inside the Eden space. After subsequent minor garbage
collection, if the object survives, it gets moved to the survivor1 and then to
survivor 2 before major garbage collection moves the object to old or tenured
generation. Permanent generation is
special and it is used to store string pool and meta-data related to classes
and methods in JVM.
Note: Permgen space is removed from
Java SE 8 features and instead we have MetaSpace introduced. One of the most
dreaded errors in Java “java.lang.OutOfMemoryError: PermGen error” will no
longer be seen from Java 8. Nice thing is that MetaSpace default is unlimited
and that the system memory itself becomes the memory.
Important
points regarding Java Garbage Collection:
- 1 Garbage collection relieves Java programmer from memory management so the programmer can focus more on the business logic.
- 2 Garbage collection in Java is done by a daemon thread called Garbage Collector.
- 3 Being an automatic process, programmers need not worry about calling garbage collection in the code. However, System.gc() and Runtime.gc() are the methods which can be called in code explicitely, if the programmer needs to initiate garbage collection process. Although these methods exists and provide an opportunity for the programmer to start the gc process, but JVM can choose to reject this request. This means that on calling these methods for garbage collection, it is not guaranteed that these calls will do the garbage collection. This decision is taken by the JVM based on the eden space available in the heap memory.
- 4 Before removing an object from memory, garbage collection thread invokes finalize() method of that object and gives an opportunity to perform any sort of cleanup required.
When
does an object become eligible for garbage collection?
- 1. Any instance that cannot be reached by a live thread.
- 2. Circularly referenced instances that cannot be reached by any other instances.
- 3. When all references to an object are explicitly set to null.
- 4. Local variables/objects created in local scope, after they go out of scope they are eligible for garbage collection.
- 5. An instance having strong reference is never eligible for garbage collection.
- 6. For soft references, garbage collection will be done as a last option.
- 7. Weak and phantom references are eligible for garbage collection.
Types
of Garbage Collectors: Java has four types of Garbage collectors.
1.
Serial Garbage Collector
2.
Parallel Garbage Collector
3.
Concurrent Mark and Sweep Garbage Collector
4.
G1 Garbage Collector
All
of these four types have their own advantages and disadvantages. The programmers
can choose the type of garbage collector to be used by the JVM. Passing the choice as JVM
argument does this.
![]() |
| Types of Garbage Collector |
1. Serial Garbage Collector: As the name
suggests, the serial collector uses a single thread to perform the garbage
collection. It executes mark-and-sweep
algorithm in a single thread. This makes it efficient since there is no
communication overhead between the threads. It means that it works by freezing
all the application threads while it does its garbage collection job. It is
intended for applications with small data sets. The serial GC is selected by
default in the Oracle HotSpot JVM or it can be explicitly enabled with the
option –XX:+UseSerialGC.
2. Parallel Collector: This one makes use
of parallel threads as its name suggests. It is also known as the throughput
collector. Because of parallel threads, it can decrease the GC pause time by
utilizing the multiple CPUs. It is intended for medium to large sized data sets
that are run on multi-processors. Like serial GC this also freezes all the
application threads while performing garbage collection. It can be explicitly
enabled with the option –XX:+UseParallelGC.
3. Concurrent Mark and
Sweep Garbage Collector (CMS): This GC performs most of its work
concurrently. i.e while the application is running. The primary motive of this
technique is to keep the GC pauses to minimum. It is designed for application
with medium to large sized data sets. Multiple threads scan the heap memory to mark
instances for eviction and thens weep the marked instances. CMS holds the
application in two scenarios: a. while marking the referenced objects in the
tenured generation and b. if there is a change in heap memory in parallel while
doing garbage collection. It uses more CPU than Parallel GC to provide better
throughput. If we can allocate more CPUs then CMS is a preferred choice over
parallel collector. It can be explicitly enabled with option
XX:+UseConcMarkSweelGC.
4. G1 (Garbage First):
This technique splits the heap space into fixed-size regions and tracks the
live data in those regions. It keeps a set of pointers that is known as the “remembered
set” into and out of the region. When running GC becomes important, it collects
the regions with less live data first. So, it is called “garbage first”. Often this means collecting
an entire region in one step: if the number of pointers into a region is zero,
then it doesn’t need to do a mark or sweep of that region. It can be explicitly enabled with option –XX:UseG1GC.


No comments:
Post a Comment