XML Schema Design: Part 2

Now that we’ve gotten the whats and whys out of the way, we can start to talk about the guidelines themselves.

Naming Conventions
Naming conventions should be used in order to provide the understandability required in any good schema.

Names for all elements, attributes and types should be explicit.  Abbreviations should only be used where such abbreviations are obvious to anyone familiar with the domain. Any XML Types should be suffixed by the word Type. Elements should not include any suffix.

UpperCamelCase should be used in naming XML Elements and Types. This means that the first letter of each word that makes up the name will be capitalized included the first letter of the name. For example: CompanyName, AddressType, FirstName could all be valid names within the schema.

XML Attributes should be named using LowerCamelCase in which the first letter of the name is in lowercase and each additional word within the name will start with an uppercase letter.

The truth is, you can use any naming convention you want as long as it is consistent.  Consistency is the foundation of any good design and XML Schemas are no exception.

Elements Versus Attributes

Since the XML Specification allows for the same types of information to be stored as values for attributes or values for elements, a long standing debate is whether one should prefer attributes over elements for content and vice-versa. For a Canonical XML Design that requires flexibility and the ability to expand in the future, it is recommended that elements be created for most content and that attributes are only used to provide descriptors for those elements when necessary.

Another best practice to follow is to use an element to represent data that can stand on its own, independent of any parent element and use attributes to represent properties or meta-data of that element. For example. a Contact element may have a type attributed to it that tells the user what kind of contact information is provided, i.e. e-mail address or phone number, etc.

As you begin the process of abstracting elements to allow for expandability, you may find that it may make sense to store use elements in cases where you had used attributes before.  That is okay.  More on that when we discuss abstracting the schema.

Types Versus Elements

Another decision that needs to be made in designing an XML Schema is when to use types and when to use elements. In the case of a Canonical XML Schema, XML Types should be used extensively to make the schema easier to understand and easier to re-use. In cases where a generic XML Type could be used, the need to create a type is obvious. However, in cases where the element or type will most likely not be reused, it is not normally necessary to create a type. Having an explicit type definition will allow potential users of the schema to more easily interpret the design. So as a general rule, do not use anonymous complex types.

Data Types

User-derived types are composites (Complex) or subsets (Simple) of existing types. The extensions are used to consistently constrain the schema so that its use becomes easier. For example, a CurrencyAmount type can be created so that whenever a currency amount is needed in the schema, the generic type can be used. This way if the definition of the type changes, it can be changed in one place.  It also means that the developer of extensions to the schema does not need to think about how currency amounts will be constrained because it has already been done.

Use of Simple Types

Simple user-derived types are subsets of existing types. These types constrain the lexical or value space of the parent type. An example of a simple type is a type that limits the value of an element to a list of values. A use for this may be in limiting the Currency type values to standard iso currency types.

Use of Complex Types

Complex Types are user-derived types that allow various elements to be combined and represented as a whole. A Complex Type is also necessary if you wish to create a type that uses an attribute. To continue our example from above, the CurrencyAmount may be a Complex Type that includes both the type of currency (i.e., US Dollar or Euro) and the amount (i.e. 123,000) as elements within it.

In order to provide a grouping of elements an xml sequence is used. This sequence contains the elements included in the type and the order in which they are included. Elements within the list have two important attributes, minOccurs and maxOccurs which are used to indicate if the element is required and how many times it may appear in the sequence.

File Structure

When a large schema is being created, it is best to use references to include the various complex types of the schema in one definition. This allows the definition to be split up among multiple files and makes the reading of those definitions much easier. It also becomes easier to reuse portions of the schema because you can reference the subset of complex types or elements that you wish to include.

Stay Tuned for Part 3…

Part 3 will discuss some ideas on making the schema extendable via abstraction and walking the line between extendable and understandable.

If you haven’t already, check out Part 1 of the series.

Share/Save/Bookmark

Oracle acquires Sun: Who needs to look out now?

As a Java developer who does a lot of work with Oracle products including Jdeveloper and ADF, my head is still spinning a little from the news that Oracle is buying Sun Microsystems.

Oracle buying BEA hurt a little, though it was completely expected and a great move on Oracle’s part, I was a little sad to see the application server competition field drop by one but I was very happy that Oracle was smart enough to choose Weblogic.  At that point it was really the only the decision they could make.

With Oracle buying Sun there is a lot of synergy, there are many technologies that are duplicated among both companies.  Oracle owning both should make those technologies better and enable them to compete with the leaders in those respective areas.  The big ones that stick out for me:

!. Oracle’s JDeveloper and Sun’s NetBeans

Could they really afford to drop NetBeans, probably not, but can they afford to drop JDeveloper, no, not really.  Here the only thing that really makes sense is to merge the two, probably adding in the ADF wizards and goodies like that into NetBeans.  At least, that is what I hope they do.  JDeveloper isn’t bad, but I only ever use it to develop ADF projects and I bet many, many people are in that same boat.  Combining the two could end up giving Eclipse a run for it’s money, hopefully the competition just spurs both to be better.

2. Oracle’s Oracle VM and Sun’s Virtual Box

I haven’t had much experience with Oracle VM, but I have lately become a huge fan of Sun’s Virtual Box.  It’s a great product and it lets me do everything I want for free.  Will this continue to be the case?  I don’t know.  I’m not an expert on virtualization in the enterprise, I use it for desktop VMs, but I hadn’t seen much about Virtual Box working in that space.  I would imagine Oracle VM is all about virtualizing the network and competing with VMWare on that level.  With the two together VMWare’s got some competition.

3.  Oracle’s Unbreakable Linux and Sun Solaris

Oracle had a great jumpstart to their linux platform basing it on the RedHat codebase way back when.  Solaris was my first exposure to any type of Unix (Solaris and AIX, actually) and it has been around forever.  If the adoption of Linux has hurt anything, it’s probably been Solaris and through that, sales of Sun’s hardware.  Oracle says that their owning of Solaris will enable them to tune the Oracle Database software to run even better on it, and since according to Oracle, most of their database customers are using Solaris, I think they’ll probably do that.  I have no idea what will have to Unbreakable Linux though.  Who has to look out with this one?  I’d say IBM.  Buying Sun probably would have been good for them in the products space, I think the only area IBM is going to be competing in future is going to be services.  RedHat has Ubuntu to worry about on the desktop side and now a bigger threat from Oracle and Sun on the server-side, they have their work cut out for them.

4.  Oracle Database and Sun’s MySQL

MySQL has a huge customer base, most of them probably non-paying.  I think with this one, Oracle just adds it to their ever increasing repetoire of niche databases.  It won’t go away, but I see less adoption in the future, maybe a boost for PostgreSQL if they can get their act together.

5. Sun’s Java and Oracle’s ADF

Oracle has always been a big player in the specifications for the Java language.  I’m sure someone else will go into all the details, because I honestly don’t know them off the top of my head, but I do know that many technologies and ideas that ADF is based on where either approved JSR’s or close to approved JSR’s.  Does Oracle’s acquisition of Sun and Java mean that they will be better equipted to push trhough whatever they want to add to the language?  Well, I don’t think it will be quite that easy, but I’m sure it makes it easier.

I’ve always been a Java guy at heart, I work with Oracle technology sometimes, and I think they have really come a long way, but Oracle owning Java does kind of scare me a little.  One thing Oracle does really well, and JDeveloper is great at this, is making complex technologies easy to use.  It is what Microsoft does really well.  .NET makes easy the things that Java makes hard.  ADF actually does a lot of the same.  The combination of ADF and Java together could pose a big threat to Microsoft’s .NET if Oracle does it right.

My first thought about Oracle owning Java is that many developers are going to jump up and down about it and complain.  Some will probably jump ship, maybe to .NET but probably to Ruby or PHP or something else.  I don’t think many coroporations are going to change the direction of their IT departments though, so for them, it will be .NET or Java as it always has.  In the end, I thnk most Java developers are going to remain Java developers and hopefully Oracle’s backing of Java will just end up making it a better language to work with.

Microsoft might have more to worry about with Oracle owning Open Office now also.  I hope that Oracle continues to invest in it, or it’ll end up being Microsoft Office vs. Google Apps and that’s about it.  I’m all for cutting edge, but Gmail hasn’t come out of Beta yet and I’d like to see Microsoft have some competiion in this area.

So I wanted to get my thoughts out there while they were floating around in my head and hopefully yours so I could hear your opinions on the topic.  Please let me know what you think about this acquistion and what you think it means to the future of technology and competition in the field.

Share/Save/Bookmark

Simple Fix for Tomcat on Windows

I finally found the answer to one of life’s most difficult questions — why is it that everytime I redeploy a WAR file in Tomcat it fails?  The answer is that in Windows, Tomcat will hold a lock on certain files in the web application.  There is a simple fix to this dilemna, edit the context.xml file in your TOMCAT_HOME\conf directory.

Add 2 attributes to the Context element in the file, so that the element description looks like this:

<Context antiResourceLocking=”true” antiJARResource=”true”>

Problem solved.  You should be able to just copy a new version of the war file on top of the old one in the webapps directory of Tomcat and it will redeploy.

This works for me on Windows XP using apache-tomcat-6.0.18.

Share/Save/Bookmark

XML Schema Design: Part 1

Introduction

This post and the posts that follow are to provide some of my guidelines and best practices for creating and utilizing an enterprise-wide XML Schema.  I will start off with some background in this post, then move on to the guidelines and best practices in future posts.  Jack Van Hoof has a great article about Canonical Data Models (CDMs) and what they are good for on his blog.  This enterprise-wide XML Schema is an implementation of a CDM and will hereafter be referred to as a Canonical XML Schema.

Background
The primary requirement of a Canonical XML Schema and the related data model is to provide a standard format for which all content will be distributed thereby requiring applications to adhere to this common format.  If a new application is added to the platform, only a transformation between the Canonical XML Schema will be needed to allow it to produce or consume the required content.

In addition, 5 criteria should be considered:

  1. Completeness – The entirety of elements in the source schemas should be present in the new schema.
  2. Minimalism – Each element should be defined only once.
  3. Expandability - The schema should be able to anticipate data that may not have originally been found in any of the source schemas, that is, it should allow its use to grow and not hinder the use of it in the future.
  4. Comprehension – The schema should be formulated in a way the allows for easy browsing and querying.
  5. Performance - Understanding how the content in the XML documents supported by the schema will be used can help in determining some of the structure within the schema.  For instance, if one intended use of the produced XML is to provide rapid searching, then the schema should be structured to support fast searches.

Keep in mind that these criteria are often at odds with one another.  For example, designs that emphasize expandability do so at the risk of deemphasizing performance and comprehension.

Why Guidelines?

A current problem with the XML content that is currently being produced and consumed by various applications within many enterprises is a lack of standards and guidelines for the creation of such content.  A Canonical XML Schema will enforce adherence to a singular structure thereby enforcing adherence to the guidelines and best practices set forth by the schema itself.  In addition, the Canonical XML Schema must be built following guidelines and best practices.  The guidelines and best practices need to be documented to allow producers and consumers of XML content to understand why the model is designed the way it is and how to expand upon that design when it is necessary to do so.

Think about a group of systems that have grown over the years and are communicating with each other via XML (or even without XML). Once there are more than 2 systems talking to each other, it makes sense to develop as much of a generic communication pipeline as possible and a Canonical XML Schema will help you do that.

Applications Communicating without a Canonical XML Schema

Communication without a Canonical XML Schema

You can see in the picture above, that in the enterprise described there are 9 translations of data being performed, one for each pairing of applications. As applications are added, the number of translations grows exponentially.

Applications Communicating without a Canonical XML Schema

Communication using a Canonical XML Schema

In the second diagram, only 6 translations are being performed and the number of translations that need to be performed as new systems are put online grows in a linear fashion.  As new applications are added, only one translation of data needs to be performed, either from the new application to the Canonical XML Schema (if it’s a producer) or from the Canonical XML Schema to the new application (if it’s a consumer).

Next…

Part 2 will describe some of the best practices and guidelines and Part 3 will go into more depth around abstraction of elements and walking the thin line between expandable and understandable.


Share/Save/Bookmark