How Garbage Collection works?
4 min readGarbage collectors (GC) is essential for understanding how programming languages that use automatic memory management, such as Java, Python, and .NET languages, manage memory. In this post, let’s have a closer look at the inside of garbage collector, how it works, and the important features that every software engineer should be aware.
How Garbage Collectors Work
1. Memory Allocation
Initially, when objects are created, they are allocated memory in the heap. The heap is the memory used for dynamic allocation. In programming languages such as Java or Python, memory allocation is being managed automatically by the Garbage Collector. That means the GC is responsible for managing the lifecycle of the allocated memory, including its deallocation to prevent memory leaks.
There are some fundamental concepts here in the memory allocation phase. Feel free to skip to the next section if this is already know.
- Abstraction Layer: Virtual memory acts as an abstraction layer between the physical memory and the application’s memory requests. When an application allocates memory on the heap, it is actually allocating virtual memory.
- Page Mapping: The operating system manages a page table to map virtual addresses to physical addresses. When a program accesses a memory address, this mapping determines the actual physical memory location.
- Swapping: If the system runs out of physical memory, the operating system can move some of the data stored in RAM to disk (swap space or paging file), freeing up RAM for other tasks. This swapping can impact performance, especially if the system relies heavily on it.
- Memory Overcommitment: Some operating systems allow for the allocation of more virtual memory than the total physical memory available, under the assumption that not all allocated memory will be used simultaneously. This can lead to more efficient use of physical memory but requires careful management to avoid performance degradation.
2. Reachability Analysis
At the core of garbage collection is the concept of reachability. An object is considered reachable if it can be accessed in any potential continuing computation from any live thread’s stacks, static fields, or from other reachable objects.
In Java, there are four categories of objects that can serve as GC Roots:
- Objects referenced in the VM Stack;
- Objects that static members or methods reference in the Method Area;
- Objects referenced by constants in the Method Area;
- Objects referenced by the Java Native Interface (JNI) in the Native Method Stack.
3. Mark and Sweep
- Mark Phase: The GC traverses all object references starting from the roots (like global variables), marking each object it encounters as alive to indicate they're still in use.
- Sweep Phase: The GC scans through the heap for objects that weren't marked as alive. These unmarked objects are considered garbage and are thus reclaimed, freeing up memory for future allocations.
This process helps in efficiently managing memory by removing objects that are no longer needed by the application.
4. Stop-the-World Event
Many GC implementations pause the application (stop-the-world event) to perform GC cycles. This can impact performance, especially in real-time systems.
5. Generational Collection
Modern garbage collectors often use a generational approach, where the heap is divided into several generations (young, old, and sometimes permanent). The assumption is that most objects die young, so focusing GC efforts on the young generation can be more efficient.
6. Compaction
Some garbage collectors compact memory after freeing up space, reducing memory fragmentation and ensuring efficient use of space.
Java’s Garbage Collection
In contrast to C++, where the destructor is automatically called upon using the delete operator for objects created with new, Java does not offer a corresponding mechanism for explicit object release.
Java's garbage collector is designed to handle memory allocated via the constructors. It also features a finalize()
method for managing any specialized memory associated with objects before they are garbage collected.
Before reclaiming the memory of an object, the Garbage Collector first invokes the finalize() method on that object. The actual memory reclamation for the object occurs during a subsequent cycle of garbage collection.
In Java, invoking System.gc()
suggests to the JVM that it should perform garbage collection, potentially leading to object finalization. However, the execution of garbage collection and the finalization process are not assured outcomes upon calling this method.
From the start era of Java, memory management involves segregating the heap into multiple spaces to optimize garbage collection.
- The Young Generation holds newly created objects, which are more likely to be garbage collected quickly.
- The Old Generation stores objects that have survived several garbage collection cycles, indicating they have a longer life.
- Permanent Generation, used for storing class metadata and string constants, was replaced by Metaspace in Java 8.
As Java evolves, the improvements and optimizations around garbage collection have continued, focusing on performance and efficiency, including enhancements to the garbage collection algorithms and the introduction of new collectors like ZGC and Shenandoah, aimed at reducing pause times and handling large heaps more effectively.