Garbage collection is now an accepted as a customary element for dynamic memory management in run-time systems of modern programming languages, for reasons of safety and for its clear software engineering benefits. The wide-spread acceptance of object-oriented languages makes it more important than ever to have garbage collection work as fast as possible.
The state of the art in garbage collection on stock hardware are multi-generational copying collectors. In our work, we show that the tenents on which these collectors are based: (1) the mortality of young objects and (2) pointer directions and distances do not always hold for languages such as Smalltalk and Java and thus suggest alternate age-based schemes. Generational collectors concentrate on young objects because they have the highest mortality rate, but the youngest objects also include the most recently allocated objects which are clearly still live. In our measurements of Smalltalk and Java programs, we also find that the widely held belief that the vast majority of pointers point from young to old objects does not hold. Pointer directions are instead in both directions, and that average pointer distances are short. Notice that while these properties work in favor of generational collection they also work in favor of any other age-order-preserving collection scheme.
We use simulation to explore a wide variety of alternate, existing, and locally optimal collection schemes. We focus on age-based (generational) algorithms, namely those in which the dynamic heap is divided into regions, each of which contains the data allocated during a contiguous interval, and a collection step examines one or more regions to determine which objects within it are live (potentially used by the program in the future) and which are dead (garbage). We introduce and evaluate alternatives, including variations on oldest-first. We find that for many programs an oldest-first collector provides better performance (i.e., it copues much less) than traditional collection, and is close to optimal in some cases.
Based on these results and to capture cache and paging effects, we are currently implementing a selection of the best performing oldest-first collectors to compare them to our existing generational collectors, and thus further explore and understand their performance.