Print This Post

Towards a Universal Data Interchange Standard Factory

Originally published on LinkedIn: https://www.linkedin.com/pulse/towards-universal-data-interchange-standards-factory-ron-lake

Registry Platforms can provide a solid basis for creating data interchange standards, and they can shorten both the process and the cost to create them.

Many general application domains, such as the Internet of Things (IoT), Smart Cities, and Oil and Gas Pipelines, require a standard method for data interchange. The approach, in most cases, has been to either develop the data interchange from scratch, or create a serialization of a domain specific information model (or schema), or to use some combination of the two. Since standardization requires multi-agency co-operation, this can be a long and expensive process. We believe that there is a simpler way forward.

INdicio Web Registry Platform

The Open Geospatial Consortium (OGC) has created a Registry Platform standard called Catalogue Service for the Web electronic business Registry Information Model (CSW-ebRIM). Essentially, this is an combination of the ebRIM standard, created by OASIS (formerly by ebXML.org), and the OGC’s CSW/Catalogue standard, which introduces geospatial properties, based on GML (Geography Markup Language), and a search and transaction grammar very similar to the OGC/ISO Web Feature Service (WFS). The advantage of CSW-ebRIM over WFS is that it provides not only a search and transaction grammar, but also a very general, high level, data model that can support data semantics (such as taxonomies, associations, and collections), and also the notion of a Registry-Repository. The latter enables the inclusion of data object components of more or less arbitrary mime type.

Using CSW-ebRIM, we can create a domain-specific application model (such as one for IoT devices), and we immediately have a “standard” serialization of that model via ebRIM XML. There is no need to argue about the serialization at all – just agree on the model, and we are done. In many cases, an application-specific domain model may already exist and need only be translated into the CSW-ebRIM data model. For example, in the case of aviation there are existing models for AIXM, WXXM, and FXXM. These can be readily recast in CSW-ebRIM, and serialized in ebRIM XML.

The CSW-ebRIM support for Registry-Repository can be useful in several ways in the context of data interchange. In the first case, the repository can contain binary “attachments” which are described by serializable registry objects (ebRIM XML). This can be useful for backwards compatibility with binary encodings, to support the transport of binary attachments, and even to support proprietary encodings as things are transited to an open standard encoding. In the second case, the repository might contain XML or similar text records that are also described by related CSW-ebRIM Registry Objects. Galdos has used this approach to capture virtually all GML application schemas including AIXM, CityGML, and WaterML. GML application schemas (or other XML application schemas) can be maintained in the Registry Platform and automatically applied to data instances in the data payload before they are inserted into the repository.

XML is, of course, quite bulky and a more compact encoding is required in many applications. It makes no sense, however, to develop a specific binary encoding in these cases. A more rational approach is to use one of the existing binary XML encodings, such as EXI or the FastInfoset. We then have a more or less canonical approach to creating a data interchange standard:

Data and Schema Interchange Canonical Model

A Canonical Model for Data and Schema Interchange

Our argument is that the Application-Specific Information Model should be created using the CSW-ebRIM data model. Everything else is then for free.

Note that some might argue for using a relational model as the initial data model in the process. Others may argue for using XML Schema, as is done in GML. While both of these models are quite expressive, they are also very low level. The relational model offers no standard means of representing type inheritance, classification (taxonomies), relationships, or collections, and does not support the notion of an object with properties. Likewise, XML Schema provides no canonical support for the same constructs and neither do GML application schemas.

In addition to providing support for these useful constructs, Registry Platforms also provide the ability to check the conformance of data instances with the application information model, and can do so with degrees of conformance that ranges from strict to lax.

The Application Specific Information Model (schema) can itself be serialized (as can its instances), and it can be immediately utilized by any other registry that is compatible. Moreover, the XML encoding can be easily translated into other XML encodings, then into the default ebRIM XML, and to JSON or other text encodings, or to binary encodings (e.g. EXI) as noted above. A Registry Platform provides this kind of support right out of the box, which means that developers can play with evolving standards and do not have to wait for third-party tools to be created. A Registry Platform provides support right out of the box.

Registry Platforms can provide a solid basis for creating data interchange standards, and they can shorten the process and reduce the cost to create them.