GCOld: a benchmark to stress old-generation collection
Introduction
In the course of developing garbage collectors in
Java[tm] Virtual Machine (JVM) implementions, we have noticed several
characteristics of server applications that have large heaps. Often,
an application will have a mix of allocated objects with quite
different characteristic lifetimes. The weak generational
hypothesis is often true: most allocated data is short-lived. But
the case for the strong generational hypothesis, which asserts
that younger objects are more likely to be garbage than older objects
over all age ranges, is less clear. We often find that
server applications have interactions with users on time scales of
minutes (think of visiting a web site), and allocate data at the
beginning of that interaction that persists for the duration of the
interaction. If the durations of such interactions are sufficiently
similar, then the overall lifetime behavior of such data is a "FIFO":
the oldest data is most likely to be garbage.
The GCOld benchmark is a rudimentary attempt to model
a range of applications with these general object lifetime
characteristics.
Description
A run of GCOld consists of an initialization phase and a steady
state. In our measurements we generally disregard the initialization
phase and concentrate on the steady state. The program maintains an
array of pointers to heads of binary trees, each a megabyte in size.
The initialization phase consists of allocating the binary trees and
initializing the array.
The steady state consists of a number of steps. Each step:
- allocates some amount of short-lived data;
- does some amount of "mutator work";
- allocates some portion of a new binary tree that, when done,
will replace one already in the tree (making the previous one
unreachable); and
- do some amount of "pointer mutation" to the long-lived trees
(which are likely to be in the old generation in a generational
collector.)
The ratios between how much of each activity are done on each step
(and the number of steps) are controlled by command-line arguments,
described below.
The "pointer mutation" work is added because the performance of some
GC algorithms or components thereof (e.g., generational card scanning,
some forms of concurrent collection) is strongly affected by the rate
at which the mutator writes to old-generation objects. Each unit of
pointer mutation consists of the random choice of a pair of reachable
binary trees, and a path into that tree. The two subtrees at that
path are swapped, so that the size of each tree remains the same (so
the steady-state live data remains constant.)
Arguments
In invocation GCOld has the form:
java GCOld live-data-size work short/long-ratio pointer-mut-rate steps
where
- live-data-size: steady-state reachable data size, in
megabytes (approximate, will be rounded down.) Currently
assumes 8-byte object headers and 4-byte pointers. You could
adjust this for different systems by changing the static final
int "BYTES_PER_NODE". (This could be better.)
- work: units of mutator non-allocation work per byte allocated,
(in unspecified units. This will affect the promotion rate
printed at the end of the run: more mutator work per step implies
fewer steps per second implies fewer bytes promoted per second.)
- short/long-ratio: ratio of short-lived bytes allocated to long-lived
bytes allocated.
- pointer-mut-rate: number of pointer mutations per step.
- steps: number of steps to do.
Authors
This benchmark was originally developed by Dave Detlefs.
Matthias Jacob found some bugs, and Will Clinger did
an extensive rewrite (with several bug fixes), leading to the 1.0
version.