The New World of XML Documents

The days of closed, proprietary document formats are coming to an end.  Government bodies and other organizations have begun to announce their intentions to shift to open standards that will make information more accessible and allow for more sustainable archiving processes. Of particular interest at the moment is XML. The eXtensible Mark-up Language (XML) is an open specification (publicly available for use) designed as a structured mark-up language to make sharing data simpler across disparate applications, systems, and platforms. The data exchange capabilities of XML have powered innovations like Web services, RSS, and AJAX that have in turn fueled the e-commerce and Web 2.0 revolutions. The flexibility of XML that has enabled the Internet to re-invent itself is beginning to be applied offline to document types that are commonly used in business—word processing, spreadsheets, presentations, applications, forms, and the list goes on.

Recognizing the market demand pushing toward open standards and the ability to create XML specifications that incorporate the benefits of existing document formats—such as meta-data and content layout—software companies and consortiums have created specifications that are now being approved by standards committees. XML as the underlying structure for document formats has reached the tipping point. This will be a major shift in how businesses access documents and use data.

However, businesses will still need to access their content that is locked in proprietary formats. The option of relying on out-dated software for document archives runs counter to the purposes of moving to XML-based documents. Solutions for managing older document formats, already an important factor, will become an even more critical component of businesses’ overall document strategy.

Why XML Makes Sense Now

The strengths of XML – the ability to separate content from presentation, transfer data across platforms, and create specifications for specific purposes – allows tighter integration of documents into workflow processes than previously possible through binary formats. The announcement by Microsoft that Office 2007 uses Office Open XML as its underlying structure has helped push XML beyond the Web into mainstream applications. But the trend for using XML began years ago.

In 2002, OASIS began work on the Open Document Format (ODF) specification, which was approved by the group in 2005. And, in November of 2006, the ISO also approved ODF as an open standard. At the same time, Microsoft submitted its Office Open XML (OOXML) specification and received open standard approval from Emca International (a European standards group) in December 2006.

The greatest benefit of XML for documents is that anyone can use the specification to create an application to read/edit/write the documents. (Although, Microsoft has tried to get around that openness by referencing older versions of software in its specs.) Additionally, using a standard XML Schema allows any application designed to understand the XML specification to access the documents. This could mean opening the document for use or programmatically extracting information.

As the idea of XML-based documents continues to gain acceptance, more companies are positioning their own products to address the new landscape. Adobe is currently developing a new format codenamed Mars that uses an XML specification to create PDF files. And, Microsoft is developing the XML Paper Specification (XPS) format to compete with PDF.

The Long-term Benefits of XML

Archiving documents in electronic formats versus paper made a huge impact on how companies were able to use old documents. With open specification formats like PDF or TIFF, companies could standardize their document archives and include meta-data—such as date created and author. But the content was only accessible through an application built to open the format.

This can be streamlined by using a universal document viewing application that supports document formats currently in use as well as archival formats. These applications are often designed to be integrated into larger systems, enabling businesses to deploy them along with applications designed to read XML-based documents. Continued development and refinement of viewing applications means that businesses can confidently use them for their older file formats.

The move towards XML-based documents will be comparable to the migration from paper to electronic archiving. One of the main benefits of XML is that the data can be accessed without the need for an application to interpret the presentation. This differs from other formats because XML can be opened as a simple text file and the data can be read, along with the XML mark-up tags. XML files can also be accessed programmatically without the need to decompress the file into its final presentation format. This will eliminate the need for documents to be converted before moving into archives and extend their useful lifespan indefinitely.

Challenges Facing XML Adoption

The benefits of XML are significant, but in practice this ideal has some hurdles to overcome. The strength of XML being an open specification is also proving to be a weakness.  Creating a document format based on an XML specification can be done by any company or group that has the time and resources to invest in it. Examples of this are the Open Document Format and Office Open XML. Both are specifications for creating word processing documents using XML but they are considerably different in their approach and are incompatible with each other (though some software is being written to convert one to the other). Microsoft is even developing separate XML Schema for their Office applications and page presentation format (XPS).

After pushing XML into the forefront and past the tipping point for adoption, the question still remains over which specification will be accepted by mainstream businesses. Do the odds favor Microsoft because of their size and current install base? Or does ISO approval of the Open Document Format signify a shift towards truly open-source documents with businesses gravitating towards certified standards?

Whatever happens, documents based on XML are going to change the business landscape. Many of the changes will make documents more transparent and allow data to move more effectively into workflow processes. Organizations that already have a comprehensive document strategy in place will be able to harness the benefits of XML quickly and efficiently. 

While XML may still be in the future for your business, you can achieve better integration of your documents into business workflows now. Find out how document imaging software can help your current situation and prepare you for the coming changes.


XML Annotations - adding context to content