6 July 2010

XML-based programming 6: the dark side

By Andrew Clifford

Without care, XML can be difficult to understand, complicated to work with, and inefficient to process.

In the interest of balance, I want to present the negative side of using XML.

XML can be very complicated. Basic XML is easy, but once you get into namespaces, schemas and public standards, it can be a nightmare. The XML standards for Microsoft Office are over 5000 pages long.

XML is difficult to work with. It takes 20 lines of code to read a single value from an XML file using the W3C's document object model (DOM) programming interface.

XML can be very inefficient. I recently tested an XML-based routine to create 2000 questionnaires, which took 40 minutes of 100% CPU time and 500MB of memory.

Because of these problems, not everybody shares my enthusiasm for XML. Most people use a variety of scripts and files to construct, configure and test their systems, very few standardise on XML. Some use simpler standards such as JSON to represent data.

The underlying problem with XML is that because it is so flexible, methods for defining and processing XML have to cope with many different requirements. XML gives you a lot of rope: if you are lucky you just get tied in knots, if you are unlucky you hang yourself.

However, you do not need complicated XML if you are only using it to glue together your systems and to structure ancillary components like configuration files and test scripts. You can explain the rules for simple XML in less than one page. If you stick to these rules, which I find cover more than 90% of requirements, your XML will be as simple as any other data representation.

XML programming interfaces are complicated because they have to cope with all the different uses that XML can be put to. For simple XML, it is easy to create a wrapper that covers everything you want to do in a few function calls. You can process simple XML as easily as Java properties or simpler standards such as JSON, and still have the full flexibility of XML in reserve, if you need it.

The inefficiencies of XML need to be put in perspective. In my example, I was using the XML language XSLT to manage execution, which is just plain wrong. When I converted this so that Java was managing the execution, even though I was still using XML-based calls, the run time and memory more than halved, and the CPU requirement dropped to 10%. Core processing should remain in a general purpose programming language, and the core data should remain on the database; XML is there to provide a common approach to all the rest.

If you consider only one requirement in isolation, it is usually easier to use something other than XML. If you are writing Java, Java properties are easier than XML configuration files. If you are writing JavaScript, JSON is easier than XML. If you are writing a data manipulation script, bash is easier than XSLT. However, the great advantage of XML is that, with a little care, it can meet all of these requirements, and many more, with a single, consistent set of technologies.