XML DTDs Vs XML Schema

Key Takeaways

XML DTDs (Document Type Definition) and XML Schema are both used to define the structure of an XML document, but XML Schema provides a more detailed and flexible approach, including support for data types and namespaces.
DTDs provide a basic method for defining an XML document’s structure, including the elements that may be included, their attributes, and their ordering and nesting. DTDs can be defined inline within an XML document or externally in a separate file.
XML Schemas offer a more powerful and object-oriented means to define XML document structure. They provide a set of basic types, such as integer, byte, string, and floating point numbers, which can be used to create custom complex types for defining elements in the XML document.
While XML Schemas provide more control and precision, DTDs may still be used for reasons such as compatibility with legacy systems that don’t fully support XML Schema, the existence of mature DTD definitions, or the need to avoid the overhead of XML Schema in high-performance applications.

XML is a very handy format for storing and communicating your data between disparate systems in a platform-independent fashion. XML is more than just a format for computers — a guiding principle in its creation was that it should be Human Readable and easy to create.

XML allows UNIX systems written in C to communicate with Web Services that, for example, run on the Microsoft .NET architecture and are written in ASP.NET. XML is however, only the meta-language that the systems understand — and they both need to agree on the format that the XML data will be in. Typically, one of the partners in the process will offer a service to the other: one is in charge of the format of the data.

The definition serves two purposes: the first is to ensure that the data that makes it past the parsing stage is at least in the right structure. As such, it’s a first level at which ‘garbage’ input can be rejected. Secondly, the definition documents the protocol in a standard, formal way, which makes it easier for developers to understand what’s available.

DTD – The Document Type Definition

The first method used to provide this definition was the DTD, or Document Type Definition. This defines the elements that may be included in your document, what attributes these elements have, and the ordering and nesting of the elements.

The DTD is declared in a DOCTYPE declaration beneath the XML declaration contained within an XML document:

Inline Definition:

<?xml version="1.0"?> 

<!DOCTYPE documentelement [definition]>

External Definition:

<?xml version="1.0"?> 

<!DOCTYPE documentelement SYSTEM "documentelement.dtd">

The actual body of the DTD itself contains definitions in terms of elements and their attributes. For example, the following short DTD defines a bookstore. It states that a bookstore has a name, and stocks books on at least one topic.

Each topic has a name and 0 or more books in stock. Each book has a title, author and ISBN number. The name of the topic, and the name of the bookstore are defined as being the same type of element: this store’s PCDATA: just text data. The title and author of the book are stored as CDATA -- text data that won’t be parsed for further characters by the XML parser. The ISBN number is stored as an attribute of the book:

<!DOCTYPE bookstore [ 

  <!ELEMENT bookstore (topic+)> 

  <!ELEMENT topic (name,book*)> 

  <!ELEMENT name (#PCDATA)> 

  <!ELEMENT book (title,author)> 

  <!ELEMENT title (#CDATA)> 

  <!ELEMENT author (#CDATA)> 

  <!ELEMENT isbn (#PCDATA)> 

  <!ATTLIST book isbn CDATA "0"> 

  ]>

An example of a book store’s inline definition might be:

<?xml version="1.0"?> 

<!DOCTYPE bookstore [ 

  <!ELEMENT bookstore (name,topic+)> 

  <!ELEMENT topic (name,book*)> 

  <!ELEMENT name (#PCDATA)> 

  <!ELEMENT book (title,author)> 

  <!ELEMENT title (#CDATA)> 

  <!ELEMENT author (#CDATA)> 

  <!ELEMENT isbn (#PCDATA)> 

  <!ATTLIST book isbn CDATA "0"> 

  ]> 

<bookstore> 

  <name>Mike's Store</name> 

  <topic> 

    <name>XML</name> 

    <book isbn="123-456-789"> 

      <title>Mike's Guide To DTD's and XML Schemas<</title> 

      <author>Mike Jervis</author> 

    </book> 

  </topic> 

</bookstore>

Using an inline definition is handy when you only have a few documents and they’re offline, as the definition is always in the file. However, if, for example, your DTD defines the XML protocol used to talk between two seperate systems, re-transmitting the DTD with each document adds an overhead to the communciations. Having an external DTD eliminates the need to re-send each time. We could remove the DTD from the document, and place it in a DTD file on a Web server that’s accessible by the two systems:

<?xml version="1.0"?> 

<!DOCTYPE bookstore SYSTEM "http://webserver/bookstore.dtd"> 

<bookstore> 

  <name>Mike's Store</name> 

  <topic> 

    <name>XML</name> 

    <book isbn="123-456-789"> 

      <title>Mike's Guide To DTD's and XML Schemas<</title> 

      <author>Mike Jervis</author> 

    </book> 

  </topic> 

</bookstore>

The file bookstore.dtd would contain the full defintion in a plain text file:

  <!ELEMENT bookstore (name,topic+)> 

  <!ELEMENT topic (name,book*)> 

  <!ELEMENT name (#PCDATA)> 

  <!ELEMENT book (title,author)> 

  <!ELEMENT title (#CDATA)> 

  <!ELEMENT author (#CDATA)> 

  <!ELEMENT isbn (#PCDATA)> 

  <!ATTLIST book isbn CDATA "0">

The lowest level of definition in a DTD is that something is either CDATA or PCDATA: Character Data, or Parsed Character Data. We can only define an element as text, and with this limitation, it is not possible, for example, to force an element to be numeric. Attributes can be forced to a range of defined values, but they can’t be forced to be numeric.

So for example, if you stored your applications settings in an XML file, it could be manually edited so that the windows start coordinates were strings — and you’d still need to validate this in your code, rather than have the parser do it for you.

XML Schemas

XML Schemas provide a much more powerful means by which to define your XML document structure and limitations. XML Schemas are themselves XML documents. They reference the XML Schema Namespace (detailed here), and even have their own DTD.

What XML Schemas do is provide an Object Oriented approach to defining the format of an XML document. XML Schemas provide a set of basic types. These types are much wider ranging than the basic PCDATA and CDATA of DTDs. They include most basic programming types such as integer, byte, string and floating point numbers, but they also expand into Internet data types such as ISO country and language codes (en-GB for example). A full list can be found here.

The author of an XML Schema then uses these core types, along with various operators and modifiers, to create complex types of their own. These complex types are then used to define an element in the XML Document.

As a simple example, let’s try to create a basic XML Schema for defining the bookstore that we used as an example for DTDs. Firstly, we must declare this as an XSD Document, and, as we want this to be very user friendly, we’re going to add some basic documentation to it:

<xsd:schema xmlns:xsd="https://www.w3.org/2001/XMLSchema">  

<xsd:annotation>  

  <xsd:documentation xlm:lang="en">  

    XML Schema for a Bookstore as an example.  

  </xsd:documentation>  

</xsd:annotation>

Now, in the previous example, the bookstore consisted of the sequence of a name and at least one topic. We can easily do that in an XML Schema:

<xsd:element name="bookstore" type="bookstoreType"/>  

<xsd:complexType name="bookstoreType">  

  <xsd:sequence>  

    <xsd:element name="name" type="xsd:string"/>  

    <xsd:element name="topic" type="topicType" minOccurs="1"/>  

  </xsd:sequence>  

</xsd:complexType>

In this example, we’ve defined an element, bookstore, that will equate to an XML element in our document. We’ve defined it of type bookstoreType, which is not a standard type, and so we provide a definition of that type next.

We then define a complexType, which defines bookstoreType as a sequence of name and topic elements. Our “name" type is an xsd:string, a type defined by the XML Schema Namespace, and so we’ve fully defined that element.

The topic element, however, is of type topicType, another custom type that we must define. We’ve also defined our topic element with minOccurs="1", which means there must be at least one element at all times. As maxOccurs is not defined, there no upper limit to the number of elements that might be included. If we had specified neither, the default would be exactly one instance, as is used in the name element. Next, we define the schema for the topicType.

<xsd:complexType name="topicType">  

  <xsd:element name="name" type="xsd:string"/>  

  <xsd:element name="book" type="bookType" minOccurs="0"/>  

</xsd:complexType>

This is all similar to the declaration of the bookstoreType, but note that we have to re-define our name element within the scope of this type. If we’d used a complex type for name, such as nameType, which defined only an xsd:string — and defined it outside our types, we could re-use it in both. However, to illustrate the point, I decided to define it within each section. XML gets interesting when we get to defining our bookType:

<xsd:complexType name="bookType">  

  <xsd:element name="title" type="xsd:string"/>  

  <xsd:element name="author" type="xsd:string"/>  

  <xsd:attribute name="isbn" type="isbnType"/>  

</xsd:complexType>  

<xsd:simpleType name="isbnType">  

  <xsd:restriction base="xsd:string">  

    <xsd:pattern value="[0-9]{3}[-][0-9]{3}[-][0-9]{3}"/>  

  </xsd:restriction>  

</xsd:simpleType>

So the definition of the bookType is not particularly interesting. But the definition of its attribute “isbn” is. Not only does XML Schema support the use of types such as xsd:nonNegativeNumber, but we can also create our own simple types from these basic types using various modifiers. In the example for isbnType above, we base it on a string, and restrict it to match a given regular expression. Excusing my poor regex, that should limit any isbn attribute to match the standard of three groups of three digits separated by a dash.

This is just a simple example, but it should give you a taste of the many things you can do to control the content of an attribute or an element. You have far more control over what is considered a valid XML document using a schema. You can even

extend your types from other types you’ve created,
require uniqueness within scope, and
provide lookups.

It’s a nicely object oriented approach. You could build a library of complexTypes and simpleTypes for re-use throughout many projects, and even find other definitions of common types (such as an “address”, for example) from the Internet and use these to provide powerful definitions of your XML documents.

DTD vs XML Schema

The DTD provides a basic grammar for defining an XML Document in terms of the metadata that comprise the shape of the document. An XML Schema provides this, plus a detailed way to define what the data can and cannot contain. It provides far more control for the developer over what is legal, and it provides an Object Oriented approach, with all the benefits this entails.

So, if XML Schemas provide an Object Oriented approach to defining an XML document’s structure, and if XML Schemas give us the power to define re-useable types such as an ISBN number based on a wide range of pre-defined types, why would we use a DTD? There are in fact several good reasons for using the DTD instead of the schema.

Firstly, and rather an important point, is that XML Schema is a new technology. This means that whilst some XML Parsers support it fully, many still don’t. If you use XML to communicate with a legacy system, perhaps it won’t support the XML Schema.

Many systems interfaces are already defined as a DTD. They are mature definitions, rich and complex. The effort in re-writing the definition may not be worthwhile.

DTD is also established, and examples of common objects defined in a DTD abound on the Internet — freely available for re-use. A developer may be able to use these to define a DTD more quickly than they would be able to accomplish a complete re-development of the core elements as a new schema.

Finally, you must also consider the fact that the XML Schema is an XML document. It has an XML Namespace to refer to, and an XML DTD to define it. This is all overhead. When a parser examines the document, it may have to link this all in, interperate the DTD for the Schema, load the namespace, and validate the schema, etc., all before it can parse the actual XML document in question. If you’re using XML as a protocol between two systems that are in heavy use, and need a quick response, then this overhead may seriously degrade performance.

Then again, if your system is available for third party developers as a Web service, then the detailed enforcement of the XML Schema may protect your application a lot more effectively from malicious — or just plain bad — XML packets. As an example, Muse.net is an interesting technology. They have a publicly-available SOAP API defined with an XML Schema that provides their developers more control over what they receive from the user community.

On the other hand, I was recently involved in designing a system to handle incoming transactions from multiple devices. In order to scale the system, the chosen service that processes requests is a SOAP server. However, the system is completely closed, and a simple DTD on the server is enough to ensure that the packets sent from the clients arrive complete and uncorrupted, without the additional overhead of XML Schema.

Frequently Asked Questions (FAQs) about XML DTDs and XML Schema

What is the main difference between XML DTD and XML Schema?

The primary difference between XML DTD (Document Type Definition) and XML Schema is that DTD is the older method for defining the structure of an XML document, while XML Schema is a more recent method that offers more flexibility and precision. XML Schema supports data types and namespaces, which DTD does not. This means that with XML Schema, you can specify the data type of an element (like string, integer, date, etc.), and you can also use namespaces to avoid naming conflicts in your XML documents.

Can I use both DTD and XML Schema in a single XML document?

Technically, it is possible to use both DTD and XML Schema in a single XML document. However, it is not recommended because it can lead to confusion and inconsistencies. It’s better to choose one method and stick to it for defining the structure of your XML documents.

How do I convert a DTD to an XML Schema?

Converting a DTD to an XML Schema can be a complex process, especially for large and complex DTDs. There are tools available online that can automate this process, but they may not be perfect and might require manual adjustments. The general process involves mapping each DTD element to an equivalent XML Schema element, and then defining the data types and constraints for each element in the XML Schema.

Why should I use XML Schema instead of DTD?

XML Schema offers several advantages over DTD. It supports data types, which allows you to specify the type of data that an element can contain. It also supports namespaces, which can help avoid naming conflicts in your XML documents. XML Schema is also written in XML, which makes it easier to work with if you’re already familiar with XML.

Can I validate an XML document against an XML Schema?

Yes, you can validate an XML document against an XML Schema to ensure that the document adheres to the structure and constraints defined in the schema. This can be done using various XML parsers and validation tools.

What is the purpose of namespaces in XML Schema?

Namespaces in XML Schema are used to avoid naming conflicts in XML documents. They allow you to use the same element name in different parts of an XML document without causing a conflict.

How do I define a complex type in XML Schema?

In XML Schema, a complex type is defined using the element. This element can contain other elements and attributes, allowing you to define complex structures in your XML documents.

What is the difference between a simple type and a complex type in XML Schema?

In XML Schema, a simple type is a type that can contain only text. It cannot contain other elements or attributes. A complex type, on the other hand, can contain other elements and attributes, allowing you to define complex structures in your XML documents.

How do I specify the order of elements in XML Schema?

In XML Schema, you can specify the order of elements using the element. This element allows you to define a sequence of elements that must appear in a specific order in your XML documents.

Can I define default values for elements in XML Schema?

Yes, in XML Schema, you can define default values for elements using the ‘default’ attribute. If an element with a default value is empty in the XML document, the default value will be used.