We recently overhauled the GreenHopper ranking implementation, and the new “Global Rank” that’s used on the Rapid Board is relying on ActiveObjects (AO) as data storage. Since we’re storing and reading large amounts of data (at least one row per issue), we’ve been hitting the limits of AO’s read capabilities.
Reading from ActiveObjects
When using AO, you define an interface of your entity with some annotations that influence how it’s mapped to the database table AO manages. This is the interface for GreenHopper’s Global Rank entity:
With adding this to atlassian-plugins.xml, that’s pretty much all that’s needed to get ActiveObjects to work. But what is the actual object you get back after calling find/get on AO?
AO follows the Active Record Pattern, so what you’ll get is a proxy implementation of the interface you defined. This proxy holds its own caches for data, resources to enable lazy loading and relation resolution etc. Reflection is used intensively on the interface the proxy implements, for example for mapping field names to table columns and reading annotations. All this makes the proxy object powerful, but fairly heavyweight if all you’re after is reading data.
The reflection calls and overall complexity make reads CPU-heavy, and the fact that AO fully memory-buffers (and internally caches) the proxies it returns make it memory intensive as well.
Our benchmarks for reading 500,000 rows were at 3.5 minutes and about 1.4 GB heap usage. And we were just getting started, so we needed an improved, more efficient API for large data read operations.
This was our initial implementation. As you can see, the AO call returns an array of proxies, so all data is read into memory.
To solve this problem, we’ve added two new methods to the AO API:
public void stream(Class type, EntityStreamCallback streamCallback); iterates over all rows in the table and allows to specify a callback implementation that gets called with a lightweight read-only proxy implementation of the entity interface.
public void stream(Class type, Query query, EntityStreamCallback streamCallback);
does the same, but lets you specify a query, limiting the rows and columns to be streamed back.
This is the new implementation in GreenHopper:
Instead of returning an array of proxies, the callback method onRowRead is called with a single proxy object for each row in the resultset. We then pass that on directly to another callback for consumption, and the proxy can be garbage collected once the method returns.
The read-only proxy implementation has been stripped of much of the regular ActiveRecord proxy’s functionality, it has no lazy-loading or saving capabilities and its cache isn’t managed by AO. The reflection calls have been mostly externalised, leaving a lightweight, read-only implementation.
We ran our previous benchmark of 500,000 rows again, this time against the stream API. The rows were read and processed in 5 seconds, temporarily taking about 125 MB of heap, which could even be garbage collected during the call since we’re immediately discarding the proxies after each row is done.
The API additions will be available with AO 0.17, which will be shipped in JIRA 4.4.2.