3 March 2009

What's really important?

By Andrew Clifford

You can manage IT much more effectively by asking the question "What's really important?"

IT is complicated. It has lots of different parts - hardware, software, working practices - and each part can be broken down into more and more detail.

For example, consider system recovery (often known as disaster recovery).

You can look at the detail of system recovery. You can look at technologies such as RAID and remote mirroring. You can consider recovery point objectives and recovery time objectives. You can examine procedures for backups. You can look at how system recovery fits into broader business contingency. Each of these can be expanded into almost infinite detail.

But asking the question "What's really important about system recovery?" makes everything simpler. There are very few things that really matter: that you know what the recovery objectives are; that you have the capability to meet the recovery objectives; and that you have successfully tested the recovery.

You can then characterise different scenarios. You can think of the ideal, the minimum acceptable, and the unacceptable.

The ideal system recovery scenario is that recovery objectives are clearly understood because they are part of a broader business contingency or continuity plan; facilities are in place to achieve recovery; and recovery has been recently tested and found to meet objectives.

The minimum acceptable scenario is that there is an agreed view of what IT should be recovered; facilities can be made available; and reasonable recovery has been demonstrated.

There are a number of unacceptable scenarios. It may be that there is no generally accepted view of what IT should be recovered; that there are no recovery facilities; that recovery would not be possible without significant damage to the business; that important aspects of recovery objectives can not be met; or recovery has never been tested.

To give another example, what's really important about testing? Ideally tests achieve a high level of coverage, tests are passed, and tests are modified before code is changed so that the tests can guide the developers. It is acceptable that there are test plans and they are passed. It is not acceptable to have no test plans, to test very little, or that the system seriously fails the tests.

You can repeat this across all parts of IT, to get a simple view of everything that's important to managing IT in your organisation.

Starting from this simple high-level view has many advantages.

Dealing with the high-level helps communication. You can use the "really important" concepts as a focus for conversations with your colleagues outside the IT organisation, and still relate them to technical detail within.

With a simple scoring and weighting scheme, answering the questions for each of your systems gives you an insight into what systems and what aspects of IT are priorities. It is much more efficient to start with the high-level and drill down to the detail where you need it, than to start with the detail and then try to summarise.

Perhaps most importantly, this approach lets you do the whole job. You simply can't manage all the detail, but if you focus on what's really important you can grasp all the parts that make up IT. And managing the whole, not just some of the parts, is what's really important.