Well-Formed vs Valid

From SwinBrain

The terms "well-formed" and "valid" are important for markup languages, in particular for XML documents and applications of XML such as XHTML. There is an important difference between the two terms however, and this should be understood by developers.

Well-Formed Document

A well-formed XML document complies with the syntactic rules of XML markup. These rules are strict but basically quite simple. The XML syntax rules include

  • documents must be self-describing (they start with an XML declaration)
  • a document must contain one or more elements
  • documents must contain a single root element
  • start and end tags must be used to identify elements (must have closing tags)
  • empty elements must be marked as such (with a self-closing />)
  • attribute values must be quoted (single or double quotes are fine)
  • element names and attributes names are case sensitive.
  • element tags must be correctly nested (must not overlap)
  • element names (attribute names) must start with a letter.

Other interesting things about XML documents

  • whitespace is preserved (unlike the way that whitespace is ignored by web browsers in HTML documents).
  • newlines are only stored as a single newline character (carriage-return characters are removed).
  • HTML style comments <!-- ... --> (well, SGML actually) work fine as well.

A well-formed document is a good place to start, but it does not mean that the document will make any sense, or that its content makes any sense. It means that the formatting rules have been followed, and this ensures that other uses of the XML content can take place.

The concept of "well-formed" is a bit like an English sentence that is punctuated correctly; start with capital letters, a single space between works, maybe some commas, and a period at the end. However, it says nothing about the correct spelling of the words, or the grammatical sense or order of the words.

Valid Document

A valid XML document means that the document complies with the rules of a particular document specification. The specification is commonly done in an external DTD file, however it might also be a DTD section of the XML file, or an XML based schema file (which also allows other features).

Before a document can be valid to a standard, it must start by being well-formed. In fact, its the first requirement for a valid document: it must be well-formed.

Perhaps the most common examples of document specifications are for XHTML documents. There are three separate document types for XHTML 1.0, Strict, Transitional and Frameset. See HTML vs XHTML for more information about XHTML documents and their doctypes.

The XML specification requires that XML documents (such as XHTML 1.0 Strict) include an XML declaration at the beginning (self-identifying themselves as XML documents), it is common practice not to do this for XHTML documents - most newer web browsers understand this correctly, but older browsers such as Internet Explorer 6 do not understand this and as a result have been known to render the HTML content incorrectly.

This is also related to the issues of browsers interpreting HTML+CSS in "strict" or "quirks" modes.

The Short Summary

  • Well-formed: XML document meets the syntax requirements of XML.
  • Valid: XML document is well-formed AND meets the element/attribute/value requirements of a specific document specification.