Book Review: Processing XML documents with Oracle JDeveloper 11g

This is a new feature here on Java Hair, a book review. I was recently approached by the publisher of this book who asked if I would be interested in doing a review. The request was quite timely, I thought, since I have recently been working with XML Schema design (check out the XML category).

edfa

“Processing XML documents with Oracle JDeveloper 11g” seems more like the title to a whitepaper than a full-fledged book, but I found that the book actually covers a lot of topics that fall under the XML Processing umbrella.  That, and the fact that JDeveloper documentation can be difficult to come by makes this book a pretty handy addition to your library if you develop with JDeveloper 11g and you are working with XML.

Starting out, the book covers the parsing of XML documents using both the SAX API and the DOM API.  Information that you could get elsewhere, but as the book is JDeveloper 11g specific, it also includes information on how to set up your projects and which libraries you need to include that may or may not be included with your JDeveloper distribution.  Very handy information for someone using JDeveloper.

There is a chapter on using JDeveloper to design an XML Schema, something I could have used a few months ago, actually.  JDeveloper’s visual design feature for XML Schema’s is a great tool and comparable to anything I’ve used with Eclipse and NetBeans.  Following that is a chapter on validating your schema 3 different ways and how to create these projects in JDeveloper.

There are some chapters that I didn’t expect to see in this book, but were quite welcome.  A chapter devoted to transforming XML to PDF, another on transforming XML to MS Excel, storing XML in Oracle Berkeley DB XML and even a chapter on Oracle XML Publisher.

So all in all, despite the self-imposed limitation of XML and JDeveloper, the author, Deepak Vohra, has managed to cram in some very useful topics into his book.  Though some of it isn’t really JDeveloper specific, he does makes it relevant by walking thru setting up each project in JDeveloper as well as building and running the subsequent applications in JDeveloper.

The writing style is very dry, much like you’d probably expect from a reference book, and it should be treated as such, it isn’t something that you are going to want to sit down and read in one sitting.  However, if you have work to do in XML and you are considering using or already using JDeveloper as your IDE, I would definitely recommend picking it up.

You can find the book on virtually any online bookstore or on the publisher’s website: Packt Publishing.

Share/Save/Bookmark

XML Schema Design: Part 3

This is Part 3 of a 3 part series on XML Schema Design.  Check out Part 1 or Part 2.

I recently helped complete a project for a large enterprise and this series was inspired by that work and some of the questions that were raised during that process. This last part of the series covers some ways to make your schema design more flexible.

Reasons to make it more flexible were covered in Part 1, but the basic idea is adopted from evolution. If your solution is extendable and adaptable, it will encourage more people in your organization to use it., ensuring its survival. Ideally different applications within the enterprise will be able to make use of the schema without requiring an updated release of the XSD to adapt to the application’s specific needs.

Extendability

In order to achieve expandability within a single version of the schema it becomes necessary to have the types and elements within the schema allow the addition of different elements and even different attributes. This would allow a user of the schema to add their own elements to the schema without violating the schema definition and would therefore promote the schema’s use within the organization.

To provide extensibility to the schema, named complex types could have the following elements added to their definition:

<xsd:any namespace="##targetNamespace" processContents="strict" minOccurs="0" maxOccurs="unbounded" />
<xsd:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded" />

These element definitions will render the schema invalid if there are options elements appearing before their declaration. To prevent this from being an error, add a new element to encompass those generic elements. Your final definition would look like:

   <xsd:complexType name="ExtraData">
       <xsd:sequence>

           <xsd:any namespace="##targetNamespace" processContents="strict" minOccurs="0" maxOccurs="unbounded" />
           <xsd:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded" />
       </xsd:sequence>
   </xsd:complexType>

The addition of these two element definitions allow for any other elements of the target namespace to be added to the type as well as any other elements from any other namespace to be added to the type. Finally to allow for any other attributes to be added the following attribute definition could be added to named complex types:

<xsd:anyAttribute namespace="##any" processContents="skip" />

A Better Approach?

This method of expandibility works but does so by allowing for almost any XML constructs to be added to XML files in that ExtraData element. This may not always be what you want. Instead, by being careful to abstract out just enough to make the schema flexible, you may be able to achieve the same thing.

For instance, consider an XML Schema that contains many different discreet data points. Let take a simple user profile type definition for instance:

   <xsd:complexType name="UserProfileType">
       <xsd:sequence>
           <xsd:element name="FirstName" type="xsd:string"></xsd:element>
           <xsd:element name="LastName" type="xsd:string"></xsd:element>
           <xsd:element name="AccountCreated" type="xsd:dateTime"></xsd:element>
       </xsd:sequence>
       <xsd:attribute name="userId" type="xsd:string"/>
   </xsd:complexType>

An example of an instance of an XML document that is validates against this schema definition might be:

  <Profile userid="rjava">
     <FirstName>Rob</FirstName>
     <LastName>Java</LastName>
     <AccountCreated>2009-05-26T09:00:00</AccountCreated>
  </Profile>

This might work fine for a while, but what happens when you want to keep track of the last time the user accessed the site? You would have to change the schema definition. The solution discussed above would work, but what if we create a generic data type:

   <xsd:complexType name="DataType">
       <xsd:sequence>
           <xsd:element name="Description" type="xsd:string" minOccurs="0"/>
           <xsd:element name="StringValue" type="xsd:string" minOccurs="0"/>
           <xsd:element name="DateValue" type="xsd:dateTime" minOccurs="0"/>
       </xsd:sequence>
       <xsd:attribute name="name" type="xsd:string" />
   </xsd:complexType>

We could use this generic data type inside our UserProfileType:

   <xsd:complexType name="UserProfileType">
       <xsd:sequence>
           <xsd:element name="Data" type="tns:DataType" minOccurs="0" maxOccurs="unbounded"></xsd:element>
       </xsd:sequence>
       <xsd:attribute name="userId" type="xsd:string"/>
   </xsd:complexType>

We could now represent the same data using an XML document like this:

  <Profile userid="rjava">
     <Data name="FirstName">
        <StringValue>Rob</StringValue>
     </Data>
     <Data name="LastName">
        <StringValue>Java</StringValue>
     </Data>
     <Data name="AccountCreated">
        <DateValue>2009-05-26T09:00:00</DateValue>
     </Data>
     <Data name="Email">
        <StringValue>[email protected]</StringValue>
     </Data>
  </Profile>

So now we have made the UserProfileType very fluid, maybe too fluid depending on what you want to accomplish.  It is exapandable simply by adding whatever instances of Data into the UserProfileType element in your XML Document, but it doesn’t require any fields or even suggest any.  A better use may be to combine the first two examples.  That way you could enforce the required fields and make optional some of the more common fields, but still leave room for other elements that new applications may require.

Conclusion

It is very important that the Canonical XML Schema be as easy as possible to understand while still maintaining re usability and flexibility.  A Canonical XML Schema is only going to be as good as it’s user-base is large.  There isn’t much point in investing in one if the organization as a whole is not going to adopt it.

Hopefully this series of articles has given you some ideas on how to design an XML Schema that your organization can make use of.  Getting the business as a whole to adopt something like this isn’t going to be easy, especially if you don’t have an immediate need for one.  My suggestion would be to start small, start with new applications that requires an XML Schema.  Prove the value in a proper enterprise-wide design by showing how the time and effort for enhancements and changes can be reduced and you will go a long way it getting it adopted.

Share/Save/Bookmark

XML Schema Design: Part 2

Now that we’ve gotten the whats and whys out of the way, we can start to talk about the guidelines themselves.

Naming Conventions
Naming conventions should be used in order to provide the understandability required in any good schema.

Names for all elements, attributes and types should be explicit.  Abbreviations should only be used where such abbreviations are obvious to anyone familiar with the domain. Any XML Types should be suffixed by the word Type. Elements should not include any suffix.

UpperCamelCase should be used in naming XML Elements and Types. This means that the first letter of each word that makes up the name will be capitalized included the first letter of the name. For example: CompanyName, AddressType, FirstName could all be valid names within the schema.

XML Attributes should be named using LowerCamelCase in which the first letter of the name is in lowercase and each additional word within the name will start with an uppercase letter.

The truth is, you can use any naming convention you want as long as it is consistent.  Consistency is the foundation of any good design and XML Schemas are no exception.

Elements Versus Attributes

Since the XML Specification allows for the same types of information to be stored as values for attributes or values for elements, a long standing debate is whether one should prefer attributes over elements for content and vice-versa. For a Canonical XML Design that requires flexibility and the ability to expand in the future, it is recommended that elements be created for most content and that attributes are only used to provide descriptors for those elements when necessary.

Another best practice to follow is to use an element to represent data that can stand on its own, independent of any parent element and use attributes to represent properties or meta-data of that element. For example. a Contact element may have a type attributed to it that tells the user what kind of contact information is provided, i.e. e-mail address or phone number, etc.

As you begin the process of abstracting elements to allow for expandability, you may find that it may make sense to store use elements in cases where you had used attributes before.  That is okay.  More on that when we discuss abstracting the schema.

Types Versus Elements

Another decision that needs to be made in designing an XML Schema is when to use types and when to use elements. In the case of a Canonical XML Schema, XML Types should be used extensively to make the schema easier to understand and easier to re-use. In cases where a generic XML Type could be used, the need to create a type is obvious. However, in cases where the element or type will most likely not be reused, it is not normally necessary to create a type. Having an explicit type definition will allow potential users of the schema to more easily interpret the design. So as a general rule, do not use anonymous complex types.

Data Types

User-derived types are composites (Complex) or subsets (Simple) of existing types. The extensions are used to consistently constrain the schema so that its use becomes easier. For example, a CurrencyAmount type can be created so that whenever a currency amount is needed in the schema, the generic type can be used. This way if the definition of the type changes, it can be changed in one place.  It also means that the developer of extensions to the schema does not need to think about how currency amounts will be constrained because it has already been done.

Use of Simple Types

Simple user-derived types are subsets of existing types. These types constrain the lexical or value space of the parent type. An example of a simple type is a type that limits the value of an element to a list of values. A use for this may be in limiting the Currency type values to standard iso currency types.

Use of Complex Types

Complex Types are user-derived types that allow various elements to be combined and represented as a whole. A Complex Type is also necessary if you wish to create a type that uses an attribute. To continue our example from above, the CurrencyAmount may be a Complex Type that includes both the type of currency (i.e., US Dollar or Euro) and the amount (i.e. 123,000) as elements within it.

In order to provide a grouping of elements an xml sequence is used. This sequence contains the elements included in the type and the order in which they are included. Elements within the list have two important attributes, minOccurs and maxOccurs which are used to indicate if the element is required and how many times it may appear in the sequence.

File Structure

When a large schema is being created, it is best to use references to include the various complex types of the schema in one definition. This allows the definition to be split up among multiple files and makes the reading of those definitions much easier. It also becomes easier to reuse portions of the schema because you can reference the subset of complex types or elements that you wish to include.

Stay Tuned for Part 3…

Part 3 will discuss some ideas on making the schema extendable via abstraction and walking the line between extendable and understandable.

If you haven’t already, check out Part 1 of the series.

Share/Save/Bookmark

XML Schema Design: Part 1

Introduction

This post and the posts that follow are to provide some of my guidelines and best practices for creating and utilizing an enterprise-wide XML Schema.  I will start off with some background in this post, then move on to the guidelines and best practices in future posts.  Jack Van Hoof has a great article about Canonical Data Models (CDMs) and what they are good for on his blog.  This enterprise-wide XML Schema is an implementation of a CDM and will hereafter be referred to as a Canonical XML Schema.

Background
The primary requirement of a Canonical XML Schema and the related data model is to provide a standard format for which all content will be distributed thereby requiring applications to adhere to this common format.  If a new application is added to the platform, only a transformation between the Canonical XML Schema will be needed to allow it to produce or consume the required content.

In addition, 5 criteria should be considered:

  1. Completeness – The entirety of elements in the source schemas should be present in the new schema.
  2. Minimalism – Each element should be defined only once.
  3. Expandability - The schema should be able to anticipate data that may not have originally been found in any of the source schemas, that is, it should allow its use to grow and not hinder the use of it in the future.
  4. Comprehension – The schema should be formulated in a way the allows for easy browsing and querying.
  5. Performance - Understanding how the content in the XML documents supported by the schema will be used can help in determining some of the structure within the schema.  For instance, if one intended use of the produced XML is to provide rapid searching, then the schema should be structured to support fast searches.

Keep in mind that these criteria are often at odds with one another.  For example, designs that emphasize expandability do so at the risk of deemphasizing performance and comprehension.

Why Guidelines?

A current problem with the XML content that is currently being produced and consumed by various applications within many enterprises is a lack of standards and guidelines for the creation of such content.  A Canonical XML Schema will enforce adherence to a singular structure thereby enforcing adherence to the guidelines and best practices set forth by the schema itself.  In addition, the Canonical XML Schema must be built following guidelines and best practices.  The guidelines and best practices need to be documented to allow producers and consumers of XML content to understand why the model is designed the way it is and how to expand upon that design when it is necessary to do so.

Think about a group of systems that have grown over the years and are communicating with each other via XML (or even without XML). Once there are more than 2 systems talking to each other, it makes sense to develop as much of a generic communication pipeline as possible and a Canonical XML Schema will help you do that.

Applications Communicating without a Canonical XML Schema

Communication without a Canonical XML Schema

You can see in the picture above, that in the enterprise described there are 9 translations of data being performed, one for each pairing of applications. As applications are added, the number of translations grows exponentially.

Applications Communicating without a Canonical XML Schema

Communication using a Canonical XML Schema

In the second diagram, only 6 translations are being performed and the number of translations that need to be performed as new systems are put online grows in a linear fashion.  As new applications are added, only one translation of data needs to be performed, either from the new application to the Canonical XML Schema (if it’s a producer) or from the Canonical XML Schema to the new application (if it’s a consumer).

Next…

Part 2 will describe some of the best practices and guidelines and Part 3 will go into more depth around abstraction of elements and walking the thin line between expandable and understandable.


Share/Save/Bookmark

NetBeans - Working with XML Schemas

Lately I have been doing a lot of work with XML Schemas.  The main thrust of the project I have been working on is to generate a Canonical XML Schema from many different XML Schemas that are used for many different reports but all report on bascially the same type of data.

In order to make better use of my time on this project I decided to take a look to see what types of tools might be available for me to use.  The budget for tools is very low so free tools were at the top of the priority list.

I decided on using NetBeans because it seemed to have more capability with XML Schemas than Eclipse did.  I have since tried Eclipse and even JDeveloper.  I found Eclipse less intuitive to use than NetBeans and JDeveloper about the same as NetBeans though a lot more colorful.

When working with XML Schemas, NetBeans provides 3 different views that you can work with.  The Source, Schema, and Design views.

NetBeans XML Source View

NetBeans XML Schema Source View

The Source view allows you to work directly with the schema file itself.  This can be quite a time saver when copying and pasting element definitions or when moving elements or types around.  It provides color coding and the right-click menu will allow you to Check or Validate your schema in addition to formatting it so everything lines up correctly.  Just as in the other views you can use Goto from the right-click menu to switch to one of the other views.

NetBeans XML Schema View

NetBeans XML Schema View

The Schema view is where I started off doing most of my work.  It allows you to right click and create new types or elements with a wizard type interface.  Customizing the elements and changing properties can also be done here, very handy if you can’t remember the syntax for a particular attribute.  Copying and pasting can be done in the Schema view but I still haven’t quite gotten the hang of it and I find it much easier in the source code view.  The Schema view does provide the ability to refactor an element or type and rename it so that it becomes renamed in any other schemas in the project that reference it.  This alone makes it worth the price of admission even if you mostly work with the source view.

I found the wizards to be extremely helpful in developing schemas for the reason most wizard-type interfaces are useful, I did not have to remember the specific details of the syntax, I could just choose the properties I wanted.  When choosing an existing built-in type, a small window provides a description of the element which made it really easy to choose the types I wanted.  The biggest drawback which seems like an oversight is the inability to add properties to the element you are creating inside the wizard, you can choose the type of element and name it, but if you want to change the properties, such as minOccurs or nillable, you have to create the element and then right-click and bring up the properties window where it can be changed.

NetBeans XML Schema Design View

NetBeans XML Schema Design View

The Design view provides a nice picture of the schema or schemas you are working with and makes it easy to see your entire schema in a nice tree format.  I found this view especially helpful when talking about the schema with the business non-techie folk.  I had started a review in Eclipse but found it’s view hard for me to work with and not as easy for the non-techies to grasp, switching to the NetBeans view made the conversations much easier.  One issue I noticed with this view is that once viewed it isn’t always in sync with changes that are made to that schema file.  Sometimes it seemed to be stuck showing me the last revision or a version from a couple changes ago.  Restarting NetBeans caused it to refresh the view, but closing and re-oopening the schema itself did not.

You can also edit your schema from the design view, being able to drag and drop attributes and elements into your design as well as delete or refactor elelements and types much in the way you can in teh schema view.  Personally, I found the schema view easier to work in and the design view really nice for being able to step back and get an overall picture of the schema design.

Share/Save/Bookmark