Print This Post

Why Does The Registry Part Matter?

In previous articles, we looked at the value of the CSW-ebRIM data model and took some pains to separate the ideas of the data model from the notions of a catalogue or registry — see “Richer Semantics for Geography – CSW-ebRIM” (http://www.galdosinc.com/archives/2018) from July 2014, and “Managing Unstructured Data in CSW-ebRIM” (http://www.galdosinc.com/archives/1187) from September 2010. In this article, we take a different approach, and explain why the registry component that implements the CSW-ebRIM data model is also critically important.

When KML first appeared, people asked what was the difference between KML and GML. The reply was usually “Google Earth”, as Google Earth provided a raison d’être for KML that GML lacked (notwithstanding the existence of WFS).

A similar question is now being asked about the difference between a registry and a database. With a registry, it is easy to construct and evolve information models, create information model instances, and build relationships between objects, exercising the full capabilities of the CSW-ebRIM data model. Not so much in a database, no matter how sophisticated it is.

When reading this article, don’t think about catalogues, and especially not catalogues of datasets or services. If you must think in this direction at all, think in terms of a registry of artifacts which are, or which describe, any kind of digital resource.

NoSQL Data Model

NoSQL is a new class of data management technologies that are being used more and more for big data systems and real-time web applications. NoSQL is also called “Not only SQL” to call attention to the fact that it can actually support SQL-like query languages.

When compared to relational databases, NoSQL databases are more scalable and provide better performance, two features that are needed to support the modern data challenges of high volume and wide variety. Improvements in scalability are often achieved by distributing systems across multiple servers, and NoSQL databases are better at this than relational databases.

NoSQL databases are designed to be non-relational, allowing them to be modeled without being constrained by an underlying relational database. They also support storage and retrieval mechanisms that are not based on rigid table structures. This approach provides design simplicity, and increases the ability to scale systems horizontally.

NoSQL uses a flexible data model. Such flexible data models allows NoSQL databases to be designed with less effort than a relational database, and to be easier to change when necessary. Having a data model that is not tied to a schema is also better for handling the variety and types of data that are being captured and processed in today’s applications, and to support both structured and unstructured data.

CSW-ebRIM Information Model

A registry provides the place where the information model lives, together with the instances that comply with it. This information model is created using CSW-ebRIM.

A registry can be seen as a store house of all sorts of information items including images, vector geographic features, video clips, and documents. In addition, it provides the means to deal with all of these artifacts in multiple natural languages, through the support of localized strings for names, descriptions, and other string-valued properties.

Consider, for example, Associations. Unlike in GML, where Associations are properties, Associations in CSW-ebRIM are a special kind of Registry Object (like a UML Association Class), with attributes that point at the source and target objects participating in the Association. Treated just as a means for encoding the information in a database, Associations do not provide much value — it is simply too tedious and confusing to trace the objects from one to the other. With a registry, however, CSW-ebRIM, Associations are much more treated and thought of as logical entities rather than encodings, making them more useful and easier to work with. Clients can be easily created to manipulate and navigate Associations, making them useful constructs for finding and working with other objects based on their relationships to objects already found.

Globally Unique Identifiers

A registry provides the mechanics to support the notions of global identifiers for all resources, their semantic classification and association with one another, their organization into logical collections, their selection by external identifiers, and the management of their life cycle status. A registry hides the mechanics of the data model in other areas as well. When you create some type of ExtrinsicObject, a “Building” for example, you manipulate your buildings, and create and search for building instances, as if “Building” was the type’s identifier. From the registry’s perspective, however, it is really a globally unique URI that identifies the type and, as a result, the type can have many names which, of course, can be in many different languages.

This global identification mechanism applies to anything in the registry, from queries to associations, classification schemes, and categories.

Extended Search Capabilities

A registry also provides a powerful search capability. One of the issues in modern geographic databases is that they may contain many objects which appear to be the same, in that they may have the same name, the same or nearly the same geographic location, and the same or nearly the same postal address. Simple text-based searches, or text combined with spatial data, are very often insufficient to provide a good search experience. A registry can help here as well. Semantics can be added, even after the fact, to classify data objects and to relate them to one another, supporting searches that are more specific and therefore more useful. Moreover, the search mechanism is by no means restricted to geographic features — it works equally well for documents, video clips, images, and just about any other artifact.

Better and Faster

CSW-ebRIM is a rich and expressive data model that uses high level constructs like taxonomies, associations, typed objects with properties, and collections to speed the development and evolution of business information models.

With a CSW-ebRIM registry, an information model comes alive!