7 February 2012

Technical analysis

By Andrew Clifford

The technical content of a piece of work can hide a more complicated requirement that needs significant analysis.

We all understand files on the web. Web-based email allows you to attach files to messages. You can upload photos in Facebook. You can download software from SourceForge. Although it might require some technical programming to implement, we all know what we mean by file uploads, attachments and downloads.

If only it were that simple. Our experience of implementing file support in Metrici Advisor is a good example of how a supposedly technical requirement can hide a much more complicated piece of analysis.

First, we needed to understand what files were to be used for. Are the files going to be served as part of the website, for example as images and downloadable documents? Or do we just need a web-based file system for uploads and downloads?

We want files for three things: for web content, such as images; to present additional reference material, such as PDFs; and to allow users to attach files when performing assessments.

Using files for content means we might need to modify the files to make sure that they are usable, for example downsizing images so they fit on the page, or filtering uploaded web pages that might contain JavaScript. If you present uploaded files as part of your website, you can not provide byte-for-byte equality between what is uploaded and what is presented.

Filtering web content is technically easy (we use AntiSamy), but understanding what to filter is harder. As well as filtering HTML, we need to identify and cope with other files types, like XML, that can not be filtered but which still might be processed by the browser.

We needed to think how file storage is presented to the user. Is there a file upload area, or are files attached to other content?

In our design, each file is represented by an independent object, called a node. The nodes can be anywhere in the website structure, or grouped into upload areas.

We needed to decide what information to hold with each file. In a simple system, a file is just a file. But representing the file as a node lets us give it a title, a description, and other information.

We needed to design file storage. In a simple system, you might just store files in a directory on the web server, matching their file location with their web address. But if you have large number of files, you need to manage them differently, possibly using external storage such as Amazon S3, and then map between their storage locations and their web address. Our solution maps files on the server to web addresses, in a way that will allow us to use external storage in the future.

None of these aspects is particularly hard to program, and there are well-established methods for each part of the implementation. What makes file uploads and attachments hard, like so much technical programming, is analysing the requirements and designing a solution that meets them.

We sometimes split work (and people) into "analysis" and "technical". But we should remember that technical work can involve a great deal of analysis too.