PHP Master | Better Understanding PHP’s Garbage Collection

It’s interesting how just a few years can make a difference in the names that are given to things. If this were to come up today, it would probably be called PHP Recycling Options, because rather than picking things up and throwing them into a landfill where they’ll never be seen again, we are really talking about grabbing things whose use has passed and setting them up to be useful again. But, recycling wasn’t le petit Cherie of society back when the idea was developed and so this task was given the vulgar name of ‘Garbage Collection’. What can we do but follow what history and common usage have given us?

Key Takeaways

PHP’s garbage collection operates on three levels: end of scope, reference counting, and formal garbage collection. The end of scope level clears resources when a function, script, or session ends. Reference counting keeps track of how many entities are using a given variable, and when this count hits zero, the variable is destroyed. Formal garbage collection, introduced in PHP 5.3, addresses situations where a variable’s reference count is non-zero but can be further decremented.
PHP’s garbage collection is always turned on, but can be manually controlled. It can be turned off in the php.ini file or within a script using the gc_enable() and gc_disable() functions. The gc_collect_cycles() function allows for manual initiation of garbage collection, and the size of the root buffer can be modified in the PHP source code.
While garbage collection is beneficial for managing memory allocation and preventing memory leaks, it can also impact performance due to its resource-intensive nature. Therefore, it should be used strategically, especially in the case of long-running scripts or scripts that don’t end. In these cases, garbage collection is essential to prevent memory leaks.
Good programming practices can help optimize garbage collection. These include minimizing or eliminating global variables, tying variables to scope, and being aware of situations where arrays are used within arrays or objects reference objects. These situations can lead to memory leaks and are the primary target of the formal garbage collection process.

Program Generated Garbage

Programs use resources; sometimes small ones, sometimes much bigger. An example would be a data field. A program may define a data field, say a sequence number, that is used in the program. And once defined, this data field will take up space in memory, probably only a few bytes, but space nonetheless. Since every machine or programming environment has a finite (albeit large) amount of space available, the remaining space that it has left will be reduced by the amount of space that this field takes up.

When the program ends, naturally, the program, and any space that it has tied up, will disappear and the total space available will expand back to it’s maximum size. But what happens if the program never ends?

I’ve written a few of these such programs in my time. Works of beauty they were, and I was always pleased when everyone else in the shop noticed that I had created one. There’s nothing that points out your capabilities quite as much as bringing a big ol’ piece of IBM iron to a stand-still all by yourself, while from the surrounding cubicles one person after another says loudly, “hey, is there something wrong with the system?” The trick is to chime in second or third so to deflect attention from yourself.

But some programs are even meant to run forever, like daemons and other such things. And as they run, the amount of debris they generate can potentially keep growing. If the locked up resources are substantial, then it can have a real negative impact on your system.

As a result, every language must have a way of clearing out orphaned resources, making them available to other users and ensuring that the total available system space remains constant. Fortunately, PHP has a three tiered approach to garbage removal.

First Level – End of Scope

First, like most languages, whenever a scope of action ends, everything within that scope of action is destroyed, and any allocated resources are released. The scope of action can cover a function, a script, a session, etc. and when that scope ends, so does everything it is holding on to. Of course, you can always free up a resource any time you want by using the unset() function.

This is one reason why functions and methods are so very important, because they establish a scope of action, when particular memory usage begins and when it should end, and limits how long things can be around. They should be used whenever possible instead of global entities.

Second Level – Reference Counting

Second, like most scripting languages, PHP keeps track of how many entities are using a given variable using a technique called reference counting.

When a variable is created in a PHP script, PHP creates a little ‘container’ called a zval that consists of the value assigned to that variable plus two other pieces of information: is_ref and refcount. The zval containers are kept in a table where there is one table per scope of action (script, function, method, whatever).

is_ref is a simple true/false value that indicates if the variable is part of a reference set, thus helping PHP to tell if this is a simple variable or a reference.

The refcount is more interesting in that it holds a numeric value indicating how many different variables are using this value. That is, if you define variable $dave = 6, the refcount will be set to 1. If I then say $programmer = $dave, the refcount will be incremented to 2. PHP knows enough not to create a second zval for the value 6; it just updates the counter on the already existing value container. When the program ends, or when we leave the scope of the function, or when unset() is used, then this refcount will be decremented. When the refcount hits zero, the zval is destroyed and any memory that it was holding is now free.

Of course, this is a simple example for a simple variable. When you are talking about arrays or objects then it’s much more complicated for with multiple zrefs being created for the multiple values for an element in an array, but the basic processing is the same.

A problem occurs, however, if we use an array within another array, something that happens with some frequency in more complicated PHP scripts. In this case, the refcount for an array value is set to 1 when the original array value is set, then incremented to 2 when the array is associated with another array. If the scope of use of the second array then ends, then the refcount is decremented by 1. We are now in a situation where the value itself is no longer associated with anything, but the container (zval) that represents it still has a refcount greater than zero.

The end result is that the storage represented by the original array will not be freed up and that amount of memory is now unavailable for use by anything. Normally, we think of this amount of lost storage as being small, but often it isn’t. Arrays can be very big things today and it is especially problematic if the script in which this occurs is a daemon or other nearly continuously running function. In this case, the resultant ‘memory leak’ can have devastating consequences on performance and even the ability of a server to operate.

Third Level – Formal Garbage Collection

Obviously, reference count oriented clears have their limitations but fortunately, PHP 5.3 offered another option to help with this situation.

The specific situation that we want our garbage cycle to address is the case where the zval has been decremented, but it is still a non-zero value. Basically the cycle sees which values can be decremented further and then free up the ones that go to zero.

What really happens is that PHP keeps track of the all root containers (zvals). This is done whether garbage collection is turned on not (because it is faster for it to just do it rather than asking if garbage collection is on, yada, yada, yada). This root buffer holds up to 10,000 roots (fixed size, but this can be changed). When it fills up, then the garbage collection mechanism will kick off and it will begin analyzing this buffer.

The first thing the GC routine does is rip through the root buffer and decrement all of the zval counts by 1. As it does this, it marks each one with a little like check mark so that it only decrements a root once.

Then, it goes through again and marks (this time with a little squiggly line) all of the zvals whose reduced counts are zero. The ones that are not zero are incremented so that they resume their original values.

Finally, it will roll through there one more time, clearing out the non-zero zvals from the buffer, and freeing up the storage for the ones with a zero refcount.

Garbage collection is always turned on in PHP, but you can turn it off in the php.ini file with the directive zend.enable_gc. Or, you can do it within your script by calling the gc_enable() and gc_disable() functions.

As noted above, the garbage collection, if enabled, runs when the root is full, but you can override this and run the collection when you feel like it with the gc_collect_cycles() function. And, you can modify the size of the root buffer with the gc_root_buffer_max_entries value in the zend/zend_gc.c value in the PHP source code.

All in all, this allows you to control whether GC runs and when and were it does, which is a good thing because it is a bit resource intensive and so might not be the sort of thing you run just for the heck of it.

When Should You Use It

Because there is a performance hit attached to garbage collection, it is worth taking a minute to figure out when it should be used.

First, keep in mind that unless you overtly run it (with the gc_collect_cycles() function), the formal garbage collection will not happen until the root table (10,000 entries) is full, and since this table is at the scope level, that isn’t going to happen for small functions.

Should you use it on small scripts? That’s up to you. It’s hard to say that running something like garbage collection is a bad thing, but if you have small, quick running scripts that start and then end and are gone then there might not be much of a payback. But if your server is running a lot of small scripts that stay persistent, then it will probably be worth the effort. The only real way to know is to benchmark your application and see. And certainly, if you have long running scripts or especially scripts that do not end, then garbage collection is essential if you want to prevent the kind of memory leaking that we talked about above.

Perhaps most importantly, we should always try to follow good programming guidelines so that we minimize or eliminate global variables and tie our variables instead to scope, so that even if we have a long running script, we free up that memory when the function, rather than the script, ends. Also be aware of when you are using arrays within arrays, or objects referencing objects, since such situations can cause memory leaking and is the real target of the formal garbage collection process.

Image via Fotolia

Frequently Asked Questions (FAQs) about PHP’s Garbage Collection

What is the purpose of garbage collection in PHP?

Garbage collection in PHP is a mechanism that helps to manage memory allocation. It automatically frees up memory that is no longer in use or needed by the program. This is crucial in preventing memory leaks, which can slow down or even crash a program. Garbage collection works by identifying and removing objects that are no longer accessible by the program, thus freeing up the memory they were occupying.

How does PHP’s garbage collection work?

PHP’s garbage collection works by tracking ‘roots’ and ‘children’. A root is an object that is directly accessible, such as a global variable. A child is an object that is referenced by another object. When an object has no more references pointing to it, it becomes eligible for garbage collection. The garbage collector will then free up the memory that the object was occupying.

How can I trigger garbage collection manually in PHP?

You can manually trigger garbage collection in PHP using the gc_collect_cycles() function. This function forces a collection of any existing garbage cycles. It returns the number of collected cycles, which can be useful for debugging purposes.

What is the impact of garbage collection on PHP performance?

Garbage collection can have a significant impact on PHP performance. It helps to prevent memory leaks, which can slow down or crash a program. However, the garbage collection process itself can also consume resources and slow down a program, especially if it is triggered frequently or if there are a large number of objects to collect.

How can I optimize garbage collection in PHP?

There are several ways to optimize garbage collection in PHP. One way is to minimize the number of objects that need to be collected. This can be done by carefully managing your references and making sure to unset or nullify any references that are no longer needed. Another way is to manually trigger garbage collection at strategic points in your program, such as during low-traffic periods or after a large number of objects have been created.

What is the difference between garbage collection and memory management in PHP?

While both garbage collection and memory management are related to how PHP handles memory, they are not the same thing. Memory management refers to how PHP allocates and deallocates memory for objects. Garbage collection, on the other hand, is a specific mechanism within memory management that automatically frees up memory that is no longer in use.

How does PHP’s garbage collection compare to other languages?

PHP’s garbage collection is similar to that of other languages in that it automatically frees up memory that is no longer in use. However, the specifics of how garbage collection is implemented can vary between languages. For example, some languages use a ‘mark and sweep’ algorithm, while PHP uses a reference counting system.

Can I disable garbage collection in PHP?

Yes, you can disable garbage collection in PHP using the gc_disable() function. However, this is generally not recommended, as it can lead to memory leaks. If you do choose to disable garbage collection, make sure to carefully manage your references to prevent memory from being unnecessarily consumed.

What are garbage cycles in PHP?

Garbage cycles in PHP occur when there are groups of objects that reference each other, but are not referenced by any other objects. These cycles can’t be detected by PHP’s reference counting system, and so they are not automatically collected. The gc_collect_cycles() function can be used to manually collect these cycles.

How can I monitor garbage collection in PHP?

You can monitor garbage collection in PHP using the gc_status() function. This function returns an array with information about the current status of the garbage collector, including the number of roots in the root buffer, the number of roots to be scanned, and whether the garbage collector is currently active.