Often, when tuning performance in many applications, pools of objects offer significant speed gains, particularly when the objects are small, used often and have short lifetimes, or when the cost of setting up the objects is high. For example, in a Macintosh application that makes significant use of graphic regions, a pool of preallocated region handles will speed your code up tremendously.
With that in mind, DivNull Software investigated the possibility of pooling a
specific type of small, often used objects with short lifetimes in a Java
application:
StringBuffer
.
Contrary to expectations, this experiment determined that pooling
StringBuffer
s in Java application actually reduces
performance to a significant degree. What follows describes the experiment and
its findings.
There are no doubt many ways to conceive of a StringBuffer
pool,
but the most logical way seemed to be having a number of "buckets" of
StringBuffer
s of particular, standard sizes. Generally, a
programmer will ask for a buffer of a specific size, and the buffer will stay
that size for the duration of its use. The StringBuffer
has the
ability to grow as it is being used (though smart programmers avoid this, as it
drags down performance), so it is possible that a StringBuffer
taken from one of these buckets might be returned to a different bucket if it
had grown in the interim.
The smallest bucket size was arbitrarily chosen to be 32 characters (which is either 32 or 64 bytes, depending on if your character encoding). Additional buckets would be the successive powers of 2 from the initial size (i.e. 64, 128, etc.) until reaching a bucket size of 4096 characters. Any buffers of a size larger than this would be placed in their own buckets.
As time of development was an issue during the experiment, open source code was used to provide the mechanics of the pool rather than develop it from scratch. In particular, the commons-pool code from Jakarta was used.
None of the existing pool types supplied by commons-pool exactly matched what
was needed, but the StackKeyedObjectPool
came fairly
close, so was chosen as a base for the pool, using the size of the bucket as the
key.
As a secondary objective, the comparitive speeds of various methods of concatinating strings were also investigated.
Four Java classes were
created for the experiment. Implementation was fairly straightforward, but one area
caused more concern than expected. All of Java's collection classes store only
objects (e.g. Integer
) and cannot store basic types (e.g.
int
), including keys. Care needed to be taken, therefore, to avoid
having to allocate Integer objects to represent the keys, since the whole point
of the pool was to avoid such small allocations. The "bucket" metaphor allowed
this to be resolved, but getting the pool to work while avoiding extra Integer
allocations proved slightly trickier than expected. Remember, these classes are
basically not usefull, since they slow things down rather than speed them up:
A subclass of StackKeyedObjectPool
, instantiating
this class provides an instance of the string pool. This class also defines some
public static members to act as the predefined key sizes as well as methods to
get StringBuffer
s of those specific sizes from the pool. These
methods (rather than borrowObject
) are intended to be the main
interface to the pool. This object also contains a method to return a borrowed
buffer that is typed specifically for StringBuffer
, confining the
need for type coercion to inside the class.
The borrowing process will return a string from the smallest bucket that is equal to or larger than the buffer size requested. For example, the pool includes 32 and 64 buckets, so if you asked for a 40 character buffer, you'd get one from the size 64 bucket. Thus, you get a buffer at least as big as the one you asked for, and possibly bigger. If a request is made for a string larger than the largest standard bucket (4k), a new bucket is created just for that size.
Most of the code in the class deals with returning a
StringBuffer
to the pool, since it can be returned larger than it
was when borrowed. In the most common case (where the size remains exactly equal
to one of the bucket sizes), this process is very quick (a binary search,
effectively). If the buffer is larger than the standard set, the pool is
searched for keys of that size and the key is used if found. If not, a new key
is allocated.
A subclass of BaseKeyedPoolableObjectFactory
, this class takes care of allocating the StringBuffer
s and
clearing their contents (without altering their size) when they are
returned.
Following the singleton pattern, this is a convenience class that is intended to be the primary access to the string pool in most applications (as most apps will only need one string pool). In addition to acting as a single access point, it also handles exceptions.
A JUnit test case that runs a speed test on various string code. Six techniques for building a string are used in the test. A timer is started for each technique, which is repeated 10,000 times and the combined time reported. In addition to testing the pool speed, the effective performance of each technique was also studied.
String
objects, where a String is built from smaller strings, using a pattern like str = a+b+c+d
.String
objects, where a String is built from smaller strings, using repeated calls that look like str = str + a
.String
objects, where a String is built from smaller strings, using repeated calls that look like str.concat(a)
.StringBuffer
using new
without passing a size parameter. Builds string using repeated calls to buf.append()
followed by buf.toString()
. Since this will resize the buffer on each append
call, this should be significantly slower than the others.StringBuffer
using new
passing a size large enough for the buffer to avoid growth. String built with repeated calls to buf.append()
followed by buf.toString()
.StringBuffer
from the pool, requesting a size large enough for the buffer to avoid growth. String built with repeated calls to buf.append()
followed by buf.toString()
, and then returning the string to the pool. With the pool in place, only one StringBuffer
object is ever allocated.When StringBufferSpeedTest
is run on a 1GHz Dell OptiPlex GX200
under Win2000 and Sun's JDK 1.4.1, the results are as follows:
Test | Execution Times (ms) | ||
---|---|---|---|
Test 1 | Test 2 | Test 3 | |
String Addition | 80 | 80 | 70 |
String Naïve | 220 | 220 | 221 |
String.concat | 61 | 50 | 60 |
Unsized Buffer | 80 | 71 | 70 |
StringBuffer | 60 | 60 | 60 |
Pooled Buffer | 170 | 180 | 180 |
When StringBufferSpeedTest
is run on a 400MHz G3 under Mac OS
X.2 and Apple's JDK 1.3.1, the results are as follows:
Test | Execution Times (ms) | ||
---|---|---|---|
Test 1 | Test 2 | Test 3 | |
String Addition | 272 | 464 | 573 |
String Naïve | 1000 | 989 | 1182 |
String.concat | 458 | 382 | 357 |
Unsized Buffer | 285 | 414 | 390 |
StringBuffer | 290 | 197 | 166 |
Pooled Buffer | 1711 | 1096 | 1305 |
The primary conclusion is that memory allocation is now so fast that pooling objects is not necessary. In fact, the overhead of the pool actually takes longer than allocating memory. This is not typical to most languages.
It is probable that the pool code could be made faster (perhaps using a mechanism other than buckets). Even so, it would need to be around three to eight times faster to even equal the performance of the code it is supposed to be speeding up.
Another surprise is the speed of String.concat
in the JDK 1.4.1
case. If you Google for tips on improving Java performance, nearly all of them
suggest that the technique used in the "StringBuffer" test should be the
fastest and that using only String
objects is a performance
no-no. If this used to be true, it clearly isn't any more, as
String.concat
seems to be highly optimized.
This does not seem to be the case with the "String Naïve" technique,
which is significantly slower than expected. Clearly this is technique to be
completely avoided. (Though, it seems like a compiler could be optimized to
detect the pattern used in this technique and replace it with a
concat
call.) Conversely, the "Unsized Buffer" technique is faster
than expected. Again, based on performance tips on the internet, this technique
has the reputation of being quite slow.
This test also illustrates that Apple still has pretty far to go with the performance levels of their Java virtual machine.
Back to Wordman's Writing page.