Print This Post

Dictionary Model for GML

Within the OGC, there has been much fuss over the past several years about the use of a specification paradigm called “Core plus extensions”. This has been portrayed as if it provided some sort of formal infrastructure to ensure the modularity of specifications. In my view, while modularity is an essential objective for any specification, the idea that there is a formal means to achieve it, i.e. one that does not depend on the vagaries of language and meaning, is hard to swallow. I do not believe that there is any formal means to decide what is, and what is not, “core” to a specification, nor any such means to decide how to decompose the “extensions” into logical “packages”. This is not to say that such a decompositions cannot be made, simply that they are a lot more arbitrary than some people would like you to believe. Furthermore, the use of namespaces to encapsulate these extension packages, packages which are more or less arbitrary, is, in my view, incorrect.

As an alternative to the “core plus extensions” or “complete with profiles”, I would like to propose the dictionary or library model for GML.

In this model, we think in terms of a natural language like English or French, which has a grammar and a dictionary of words in the language. The grammar says something about how these words go together. The dictionary is just a big list of words, and their definitions, the definitions being written using sentences from the language that comply with the grammar. We don’t get too excited about the size of the dictionary as we go about our daily business using an appropriate subset of the dictionary.

If we apply this to GML (or other extensible encoding standards), we see that the grammar corresponds to the basic GML model (e.g. how we serialize an object as XML, how we serialize a class as XML Schema), and includes things like the “object-property-value” pattern, remote reference pattern, etc. The rest of GML is a then nothing more than a dictionary of words in the GML language. If we look in the dictionary we see lots of words, since GML covers lots of ground – but we can expect things like Point, LineString, TimeInstant, and so forth. Just as in English, we can think of there being many such dictionaries, some being very complete and covering “most of the language” like the Oxford Dictionary of English, or being very specialized and covering only a narrow domain like Aircraft Mechanics. These dictionaries might be constructed from one another in various ways. For example, I might make the “Pocket Dictionary for Travellers” from the Oxford Dictionary of English, but just extracting the words I think travelers would like to know about and use.

This Dictionary Model for GML is similar in many ways to the use of object libraries in programming languages. If I write a program, it will frequently use other programs and components, written by others, that I access through a program library (e.g. API). When my program integrates these other programs and components, it does not “import” the entire program library or libraries, but only those programs or components that are actually used by my program. In using a Dictionary Model for GML, I would only use the schema components in the dictionary that my “program” (my application schema) actually uses. Of course, the naïve use of XML schema parsers does not support this model, but that is something that can be fixed by other means, and should not impact the GML standard.

How do we decide what words get into the dictionary? What is the governance process? Remember that we are talking about only the core GML elements here, and not those in derived dictionaries for particular application domains. Clearly we do need a process which ensures that any word (i.e. schema component) that we add to the dictionary is compliant with the GML model, that it correctly uses any other “words” from GML, and that it is semantically correct. This seems like a task for the OGC’s GML SWG; however, other groups within the OGC and elsewhere could submit word definitions for inclusion in the dictionary.

Should we couple governance with the use of XML namespaces? I would argue no. Namespaces in XML exist to disambiguate terms (elements, attributes) which are spelt the same, but which have different meanings. If the definition of Point in GML is not changed from one version to the next, then there is no reason for the namespace to change. Since many schema components in GML use elements, types, attributes, etc., from other schema components, there is no logical way to decompose GML into different namespaces, except using purely arbitrary means, like splitting coverages from geometry, or topology from geometry. Note that if one realized coverages alone, one would have had to introduce points, lines, polygons, etc., in coverages. The interdependency between these categories surely means that creating namespace boundaries, as a means of governance, does not make sense. I see no problem in recognizing governance boundaries, based on domains of expertise, or other more or less arbitrary criteria. I just do not see enshrining them using namespaces.

This is not to say that all of GML must forever be in a single namespace. For example, if in the future we change the meaning of an element (e.g. Point), but we wish to keep the name, then the new Point, must clearly live in a new namespace.

Using this dictionary model (please don’t confuse this discussion with gml:Dictionary elements) for GML enables a much more fine grained modularity than will be possible with any interpretation of “core plus extensions”. Let’s move in this direction.