12 August 2014

Representing table structures in memory

By Andrew Clifford

The design challenges Metrici has faced with summary data are common to lots of systems, and the solution we have adopted could be used elsewhere.

Metrici holds data across multiple nodes and members, each of which is physically implemented as a separate row in the database. This is very versatile but can be slow to gather, process and display large volumes of data. This problem is not unique to Metrici - many systems need to manipulate data in summary form, rather than record-at-a-time.

We have tried lots of different approaches. We had one approach for displaying tables of data, one for holding evaluation scores from assessments, another for holding reporting data, one for creating charts, one for importing data, another for exporting data, and so on. We used both XML and JavaScript objects. Although each approach was a reasonable response to different aspects of design, the overall result was a muddle.

To overcome this, we standardised on a single way of representing summary data, and rebuilt components around this. Although not optimised for any single purpose, the ability to reuse the same components across the system has made everything much simpler.

The basic idea of the standard is to use JavaScript Object Notation (JSON) to represent summary data as tables. The structure of each table is very simple:

The full standard is documented in the Metrici table structure documentation. As a simple example, the table:

NameSpecies
FidoDog
TiddlesCat
JawsGoldfish

Can be represented using the structure:

{
"columns": [
{
"reference": "name"
"name": "Name",
"type": "text"
},
{
"reference": "species",
"name": "Species",
"type": "text"
}
],
"rows": [
{
"name": "Fido",
"species": "Dog"
},
{
"name": "Tiddles",
"species": "Cat"
},
{
"name": "Jaws",
"species": "Goldfish"
}
]
}

The same structure is used throughout Metrici for different purposes. It is used to efficiently extract data from the triple store, including incrementally recalculating only those parts of a table that have changed. It is used for set-based operations such as filtering and data aggregation. It is used to display data as HTML tables, blocks of content, or lists. It is used to pass data to charting routines. It is used as an in-memory structure, but it is also written to and read from node members to persist tables in the database. It is particularly effective with Metrici's derivation processing, efficiently providing summarised data that is automatically kept up to date. It is very fast indeed.

The standard we have adopted for representing tables in memory has proved really useful in many different scenarios, and I would recommend it to anyone facing similar design problems. Over the next few weeks I will show in more detail how it is used in different parts of Metrici.