Minimal IT - Representing table structures in memory

12 August 2014

Representing table structures in memory

The design challenges Metrici has faced with summary data are common to lots of systems, and the solution we have adopted could be used elsewhere.

Metrici holds data across multiple nodes and members, each of which is physically implemented as a separate row in the database. This is very versatile but can be slow to gather, process and display large volumes of data. This problem is not unique to Metrici - many systems need to manipulate data in summary form, rather than record-at-a-time.

We have tried lots of different approaches. We had one approach for displaying tables of data, one for holding evaluation scores from assessments, another for holding reporting data, one for creating charts, one for importing data, another for exporting data, and so on. We used both XML and JavaScript objects. Although each approach was a reasonable response to different aspects of design, the overall result was a muddle.

To overcome this, we standardised on a single way of representing summary data, and rebuilt components around this. Although not optimised for any single purpose, the ability to reuse the same components across the system has made everything much simpler.

The basic idea of the standard is to use JavaScript Object Notation (JSON) to represent summary data as tables. The structure of each table is very simple:

Each table is an object.
Each table has an array of column objects called "columns".
Each column has name, reference and type properties.
Each table has an array of row objects called "rows".
Each row object contains a set of fields.
The names of the fields match the references of the columns.
The values of the fields match the data types in the corresponding column.
Each table can have an options object to control processing.

The full standard is documented in the Metrici table structure documentation. As a simple example, the table:

Name	Species
Fido	Dog
Tiddles	Cat
Jaws	Goldfish

Can be represented using the structure:

{
    "columns": [
        {
            "reference": "name"
            "name": "Name",
            "type": "text"
        },
        {
            "reference": "species",
            "name": "Species",
            "type": "text"
        }
    ],
    "rows": [
        {
            "name": "Fido",
            "species": "Dog"
        },
        {
            "name": "Tiddles",
            "species": "Cat"
        },
        {
            "name": "Jaws",
            "species": "Goldfish"
        }
    ]
}

The same structure is used throughout Metrici for different purposes. It is used to efficiently extract data from the triple store, including incrementally recalculating only those parts of a table that have changed. It is used for set-based operations such as filtering and data aggregation. It is used to display data as HTML tables, blocks of content, or lists. It is used to pass data to charting routines. It is used as an in-memory structure, but it is also written to and read from node members to persist tables in the database. It is particularly effective with Metrici's derivation processing, efficiently providing summarised data that is automatically kept up to date. It is very fast indeed.

The standard we have adopted for representing tables in memory has proved really useful in many different scenarios, and I would recommend it to anyone facing similar design problems. Over the next few weeks I will show in more detail how it is used in different parts of Metrici.