XML Schema Design: Part 2 April 21
Now that we’ve gotten the whats and whys out of the way, we can start to talk about the guidelines themselves.
Naming Conventions
Naming conventions should be used in order to provide the understandability required in any good schema.
Names for all elements, attributes and types should be explicit. Abbreviations should only be used where such abbreviations are obvious to anyone familiar with the domain. Any XML Types should be suffixed by the word Type. Elements should not include any suffix.
UpperCamelCase should be used in naming XML Elements and Types. This means that the first letter of each word that makes up the name will be capitalized included the first letter of the name. For example: CompanyName, AddressType, FirstName could all be valid names within the schema.
XML Attributes should be named using LowerCamelCase in which the first letter of the name is in lowercase and each additional word within the name will start with an uppercase letter.
The truth is, you can use any naming convention you want as long as it is consistent. Consistency is the foundation of any good design and XML Schemas are no exception.
Elements Versus Attributes
Since the XML Specification allows for the same types of information to be stored as values for attributes or values for elements, a long standing debate is whether one should prefer attributes over elements for content and vice-versa. For a Canonical XML Design that requires flexibility and the ability to expand in the future, it is recommended that elements be created for most content and that attributes are only used to provide descriptors for those elements when necessary.
Another best practice to follow is to use an element to represent data that can stand on its own, independent of any parent element and use attributes to represent properties or meta-data of that element. For example. a Contact element may have a type attributed to it that tells the user what kind of contact information is provided, i.e. e-mail address or phone number, etc.
As you begin the process of abstracting elements to allow for expandability, you may find that it may make sense to store use elements in cases where you had used attributes before. That is okay. More on that when we discuss abstracting the schema.
Types Versus Elements
Another decision that needs to be made in designing an XML Schema is when to use types and when to use elements. In the case of a Canonical XML Schema, XML Types should be used extensively to make the schema easier to understand and easier to re-use. In cases where a generic XML Type could be used, the need to create a type is obvious. However, in cases where the element or type will most likely not be reused, it is not normally necessary to create a type. Having an explicit type definition will allow potential users of the schema to more easily interpret the design. So as a general rule, do not use anonymous complex types.
Data Types
User-derived types are composites (Complex) or subsets (Simple) of existing types. The extensions are used to consistently constrain the schema so that its use becomes easier. For example, a CurrencyAmount type can be created so that whenever a currency amount is needed in the schema, the generic type can be used. This way if the definition of the type changes, it can be changed in one place. It also means that the developer of extensions to the schema does not need to think about how currency amounts will be constrained because it has already been done.
Use of Simple Types
Simple user-derived types are subsets of existing types. These types constrain the lexical or value space of the parent type. An example of a simple type is a type that limits the value of an element to a list of values. A use for this may be in limiting the Currency type values to standard iso currency types.
Use of Complex Types
Complex Types are user-derived types that allow various elements to be combined and represented as a whole. A Complex Type is also necessary if you wish to create a type that uses an attribute. To continue our example from above, the CurrencyAmount may be a Complex Type that includes both the type of currency (i.e., US Dollar or Euro) and the amount (i.e. 123,000) as elements within it.
In order to provide a grouping of elements an xml sequence is used. This sequence contains the elements included in the type and the order in which they are included. Elements within the list have two important attributes, minOccurs and maxOccurs which are used to indicate if the element is required and how many times it may appear in the sequence.
File Structure
When a large schema is being created, it is best to use references to include the various complex types of the schema in one definition. This allows the definition to be split up among multiple files and makes the reading of those definitions much easier. It also becomes easier to reuse portions of the schema because you can reference the subset of complex types or elements that you wish to include.
Stay Tuned for Part 3…
Part 3 will discuss some ideas on making the schema extendable via abstraction and walking the line between extendable and understandable.
If you haven’t already, check out Part 1 of the series.