next up previous
Next: Related Work Up: Benchmarking the System Previous: Sparse matrix-vector multiplication benchmark

Memory Usage

The space efficiency of an intermediate language is often equally as important as its time efficiency. The Java VcodeEmulation class and the existing CVL implementation use essentially the same data types, and so their memory usage per vector is similar. For example, a Java integer array of length n occupies 4n + 16 bytes in the Sun JDK, compared to 4n bytes in a typical C implementation. However, the dynamic memory usage of the VCODE and Java interpreters differ. The VCODE interpreter is optimized for the case of a few big objects (vectors), whereas Java's general-purpose memory allocation mechanism is optimized for many small objects. In particular, the VCODE interpreter uses reference counting to determine when a vector is no longer used, and hence can reclaim its space immediately. The interpreter has to halt and compress vector memory only when it can no longer find a free fragment large enough to satisfy a request. By comparison, current Java virtual machines typically perform garbage collection only when the system is idle, when there is no longer enough free memory, or on demand.

As an example, the VCODE interpreter requires just over 1.75 MB of heap to run the line-fit benchmark on input vectors of length 216 (65536) without performing memory compaction. A double-precision floating-point vector of this length requires 0.5 MB of memory, so the VCODE interpreter is storing at most three of these vectors at any one time. The JDK Java interpreter using the original VcodeEmulation class requires 8.5 MB of heap to run the same benchmark without triggering a garbage collection, because it cannot implicitly reclaim the space used by the temporary vectors that the benchmark generates. We therefore extended VcodeEmulation to reuse the last vector popped from the top of the stack whenever possible (i.e., whenever the vector is of the right length and type to be used for a result). This modification reduces the minimum Java heap size required to run the benchmark without garbage collection to 3.5 MB, and all benchmark results are for this modified version of VcodeEmulation. A full reference-counting algorithm similar to that employed by the VCODE interpreter would probably reduce the memory usage still further. This would effectively be implementing a second level of garbage collection specialized for our particular language, which has rather unusual memory usage characteristics.


next up previous
Next: Related Work Up: Benchmarking the System Previous: Sparse matrix-vector multiplication benchmark

Jonathan Hardwick