본문 바로가기

Posts

GC of python

1. Basics: Reference Counting

The del statement removes a name, not the actual object. An object may be garbage collected as a result of del, but only if the deleted variable was the last reference to the object the object becomes unreachable (e.g., due to circular references).

In CPython, garbage collection is primarily based on reference counting. Essentially, each object keeps track of how many references are pointing to it—its refcount. As soon as the refcount drops to zero, CPython immediately calls the object’s __del__() method (if defined) and frees the memory allocated to it, effectively destroying the object.

Starting from CPython 2.0, a generational garbage collection algorithm was introduced to detect groups of objects involved in circular references. Other Python implementations may use more advanced garbage collectors that are not based on reference counting, so even when all references to an object are gone, the __del__() method may not be called immediately.

The special method __del__() does not destroy the object and should not be called directly from user code. __del__() is automatically invoked by the Python interpreter just before the object is destroyed, usually to clean up external resources. Properly using __del__() can be tricky and error-prone.

2. Additional: Mark and Sweep

Reference counting cannot resolve memory leaks caused by circular references.

Circular references: A group of objects reference each other, so their refcounts are not zero, but they are still unreachable from the rest of the program.

The mark-and-sweep approach requires the garbage collector to pause execution and examine objects for cycles, which can negatively impact performance. For this reason, Python uses both reference counting and mark-and-sweep garbage collection mechanisms.

3. GIL (Global Interpreter Lock)

The CPython interpreter is not thread-safe internally, so it uses a Global Interpreter Lock (GIL). The GIL restricts execution to one thread at a time, meaning only one thread can execute Python bytecode at any given moment. As a result, a single Python process cannot fully utilize multiple CPU cores simultaneously.

This limitation is specific to the CPython interpreter, not the Python language itself. Implementations like JPython or IronPython do not have this restriction. However, even PyPy, currently the fastest Python interpreter, still has a GIL. Although we cannot directly control the GIL when writing Python code, built-in functions and C extensions can release the GIL when performing long-running operations. In fact, Python libraries written in C can manage the GIL themselves, spawn their own OS-level threads, and take advantage of multiple CPU cores. However, this makes the library code significantly more complex, so most library authors avoid this approach.

4. Blocking I/O

All standard library functions that perform blocking I/O operations release the GIL while waiting for results from the OS. This means Python programs that are I/O-bound can benefit from threading, even at the Python level.

For example, while one Python thread is waiting for a network response, the blocking I/O function releases the GIL, allowing another thread to run. Every blocking I/O function in the standard library is designed to release the GIL, enabling other threads to proceed. Even time.sleep() releases the GIL.

Therefore, Python threads can be extremely useful in I/O-bound applications, even with the presence of the GIL. This is why David Beazley once remarked: *"Python threads are great at being lazy."*

5. Pros and Cons of Reference Counting

#define Py_DECREF(op)                                   \\
    do {                                                \\
        if (_Py_DEC_REFTOTAL  _Py_REF_DEBUG_COMMA       \\
        --((PyObject*)(op))->ob_refcnt != 0)            \\
            _Py_CHECK_REFCNT(op)                        \\
        else                                            \\
        _Py_Dealloc((PyObject *)(op));                  \\
    } while (0)

One advantage of reference counting is that it reclaims memory immediately instead of letting garbage accumulate, which can be beneficial in terms of memory efficiency. On the other hand, reference counts must be updated in real-time whenever a reference is added or removed.

Updating the reference count must be an atomic operation. Without atomicity, concurrent access from multiple threads can lead to race conditions and incorrect behavior. To ensure atomicity, every operation would require locking, which can seriously impact performance and even lead to deadlocks in the worst case. As a solution, CPython enforces a global lock (the GIL) to guarantee atomicity of reference count updates.

CPU-bound and Multi-threaded Programs

// Only one lock -> Only one CPU calculates!
CPU1: [🔒Calc🔒] ---> [  Idle  ]
CPU2: [  Idle  ] ---> [🔒Calc🔒] ---> [  Idle  ] ---> [🔒Calc🔒]
CPU3: [  Idle  ] -------------------> [🔒Calc🔒] ---> [  Idle  ]

⇒ A bottleneck scenario

I/O-bound and Multi-threaded Programs

CPU1: [🔒CPU🔒] ---> [R/W (GIL Released)]
CPU2: [  Idle  ] ---> [🔒CPU🔒] ---> [R/W (GIL Released)]
CPU3: [  Idle  ] ---------------> [🔒CPU🔒] ---> [R/W (GIL Released)]

⇒ Performance gains: Most of the general programming we do is I/O bound. We can benefit from this.