Why Use XML?

I had an interesting experience today when I attended a lunch time presentation about XML. Myself, I love XML and what you can do with it and my default answer to the question “Why use XML?” would be “It’s way cool”. However, for more business minded folks out there, perhaps I can provide a few answers as to why you might be interested in using XML in some way on your future software projects. And thank you to Bob Agnew and Jeff Abbot for some new insight as to how XML can be used and viewed while you are doing your design work.

1. XML is a common format for containing data sent from one software system to another – Prior to XML, when one piece of software needed to communicate with another, developers would need to define a communication protocol and a data protocol for communication. With the advent of the web and recongition of TCP/IP and the HTTP protocol which sits on top of it, the communication layer is taken care of, but the data still needs to be formatted in some common way. XML provides a way to do this, that is modular and can be defined in a way that preserves relationships between groups, which is helpful when pulling data from relational databases.

So, why is this a good thing? Well, if I have System A over here that has payroll information and you have System B over there that has Tax tables, with XML it’s a lot easier to pull in the tax table data and use it to transform the payroll information into paychecks for employees. Both systems have a common language between them, instead of System A talking about System A information in gobbledy gook and System B using higgeldy piggeldy, which requires System C in the middle perhaps to translate. Both systems ‘understand’ XML and can parse through each other’s documents. This also means that System A doesn’t have to know the structure of System B’s database tables, because it’s getting a container of data that contains just what it needs for it’s processing. This is a big help in removing targets for hacking. For example, only the tax table information is made available to System A, even though other records of information are kept in the same database. Only the object needed by System A is passed and System A doesn’t have the means to explore for other objects that might be of interest to someone else.

2. XML is very human readable – There are already some standard data formats, such as comma or other character delimited files which can be used to transfer data from one system to another and these format still have their uses. However, if a human being has to debug some problems with the data, with XML it’s much easier to debug because the tags are right there next to the data and the reader doesn’t have to keep referring back to a column or field heading that might be on another page. The tags also tend to be more descriptive and attributes can be used to give more meaning to the data. For example, if a money amount is being sent across, it can be enclosed by a money tag with attributes of currency and language that can define to a presentation layer how the money amount should be displayed.

3. XML is content oriented and separated from the logic and presentation layer – XML is the area where the content for a page or document is defined. Now, with systems as they upgrade, the content is not likely to change, for example, with the money example, the amount is still going to be displayed on a page having to do with purchase requests or receipts. However, the way that amount is calculated and the way it looks can change.

For example, think about a login page for a system that’s going to be used in a situation where the users speak several different languages. The page is likely to have a greeting on it, along with a username and password box and some kind of submit button. That content will stay the same, it will be the presentation portion that changes, based upon the language chosen. So, if French is chosen, the greeting will display Bon Jour! as opposed to Welcome if English is chosen. The content stays the same, which means that the page can be tested once to make sure all of the elements exist on the page. When a new language is added in, the content definiton doesn’t change, just the data being pulled in, which depending on the situation can reduce the testing because things like the logic don’t necessarily need to be retested.

4. XML has preliminary error checking of content built in – XML makes use of the idea of a Document Type Definition (DTD), which allows a document author to create only documents that are valid. So, what does this mean? It means that a document can be checked to make sure that the content and any data included on the page. So, suppose you have a company that has defined an XML DTD for purchase orders they receive from their customers. By requiring the customers to validate their XML document before sending it in and by also verifying any purchase orders received with this DTD, it means that all of the data contained has been validated, in other words, where a money amount is required, there is one, the value fits into a predefined format, the last name and first name fields are filled in, etc. Now, a lot here depends on how good the DTD is, and they aren’t necessarily simple to write, but once you have one worked out, you can reduce the number of errors and review that are required, along with the back and forth communications that can occur because of the errors. And there will still need to be review of the purchase orders, but the obvious things, like missing names or amounts will already have been filled in and the quality of the data is what’s being verified instead.

I think those are the main points, although it feels like there are some other points that should be included. If you can think of other reasons to use XML, I hope you’ll add them into the comments section for others to read about as well.