23 December 2008

Don't ignore your database, use it

By Andrew Clifford

Programmers beware - you will suffer serious performance problems if you ignore the database and do too much data manipulation in the code.

A little while ago, we hit a few performance problems with our Metrici Advisor service. The underlying problem is that our database has grown significantly, and that we need to move to more powerful servers.

However, I think it is always a mistake just to throw hardware at performance problems, so I had a look in more detail at the problems.

Most of the problems were straightforward. We needed a few new indexes for the database. We could speed up some of the web pages by using more XSLT, and less other scripting.

By far the biggest problem was how we used the database.

Metrici Advisor is a database-centric application. It has a reasonably complicated database structure. We do not access the database directly from the user interface code. We use a thin layer of Java code to turn a meaningful request into one or more database accesses, and to interpret and return the results. A typical transaction runs a query or two to check permissions, and then a query or two to retrieve results.

To keep the queries simple, we encapsulate some of the commonly used queries, such as permission checking, into reusable methods. These utility routines work well, and simplify the rest of the Java code and database access. However, without thinking through the consequences, I had built complicated code using these methods.

For example, the code which works out what assessment questionnaires someone may use needs to look at licenses, permissions, and versions. To keep this simple, I had split it down in the code. The code looked at all the licenses for the user, then all questionnaires permitted by the license, then version information, and so on.

It was only a few lines of Java code, and relatively easy to follow. It worked fine in most cases. But this sort of code does not scale. The system almost ground to a halt for one of our more active users, who has many separate licenses, because all the code was being repeated for each license.

To fix the problem, we moved all the data access and all the logic to where it belongs - the database. The database is the correct place to select the right data, to merge it, to sort it, to apply business rules. We developed one big query, rather than lots of little queries joined together with code.

The root of the problem is that we mixed different design approaches. We could have built an object-oriented system, with objects that represent the concepts of the system and which efficiently manage their own data access. But we decided instead to build a database application with simple procedural code.

Either approach could work, but a mix does not. We should not have tried to build complicated data access from simple building blocks and expect it to work efficiently, without having put the engineering effort into the building blocks in the first place.

We have to accept the opportunities and limitations of the database-centric design, and build the system accordingly. We have to use the database, not ignore it.