Many people have charged that "GML is complex" but few have identified the origin of this complexity (you might also want to look at the article – http://www.javaworld.com/javaworld/jw-09-2005/jw-0905-xml.html ). This is the subject of the current note. GML is looked on as being complex for the following reasons:
- The specification is thick (Over 600 pages).
- The specification describes many objects (over 1000 tags identified).
- GML uses application schemas.
- GML deals with complex topics (geometry, topology, coverages).
- GML separates presentation from content.
- GML has an object-property rule
- GML is written in XML Schema
The specification is thick
True. The specification is indeed long. It is, however, not longer than other important specifications such as XML Schema, SQL (over 1200 pages). One might compare the "complexity" of GML in terms of the size of the spec to the complexity of a telephone book. The latter is also very thick. Any large city (where "books" are still in use) will be some thousand or more pages of very fine print. In all cases, however, the model underlying the telephone book will be more or less the same and quite simple – namely person's name, address, and telephone number. Much the same could be said of GML. The specification is long – but in all of the objects described there is a the same model and it is quite simple – namely and object (Curve, Point, Feature) and the object's properties. More over this model has not changed to any appreciable degree since GML Version 1.0. So how to read the GML "phone book" is the same now as it always was. A simple model and a thick specification. This is because GML is essentially a content specification – it uses a simple model to describe a large number of kinds of objects.
The specification describes many objects
This is true. There are over 1000 tags in GML and hence a few hundred object types are described. How should I read the GML specification? To start with read the parts that interest you or are important for your area of application. I you are not concerned with topology you do not need to read that section. Ditto for coverages, observations etc. For many users, a general understanding of features and geometries is enough. For others only coordinate systems are important. It is just a function of the concepts you need in your application domain.
GML uses application schemas
Unlike many other XML Schema grammars, GML does not rely on a single closed schema to define GML application objects. If you want to have a road, river or church steeple you will need to create an application schema. Some people find this requirement complex. It has a number of well known precedents, however, including:
- Relational schemas – to create a table in a relational DBMS you need to decide on the table structure or schema. In the same way to create an object in GML you need to create a GML application schema using XML Schema.
- Objects – to create an instance of an object in object oriented languages like C++, Java etc. you need to create a class – the class defines a "schema" for the object.
Early in GML we considered creating a schema language in GML itself. Thus in one of the profiles of GML 1.0 you will see something like:
<gml:Feature typeName="Road"> .. </gml:Feature>
This was an attempt to make GML only a single schema. The difficulty with this approach is that:
- We are creating our own schema language in GML for which no tools exist or are likely to exist.
- While it might start very simple – it would likely grow into something complex like XML Schema as we added support for enumerations, ranges etc.
One might note that other geographic languages also use schemas, in particular KML (Google). It has gone the GML v1.1 route and defined a new schema language. At the moment this seems to support only simple types – but people will surely want more .. then what do we do.
GML deals with complex topics
This is certainly true. The topics that underly Geography are not necessarily simple. Since GML exposes these objects directly it provides exposure to the complexity of the objects themselves. What is a Polygon? Can it have holes? What is a geometry complex? and so on. GML is the raw nuts and bolts of geography. In terms of using GML you need only understand the objects you need to deal with …
GML separates presentation from content
This is a common place of XML. Lots of XML is commonly styled to XHTML or HTML for presentation. Nonetheless this does introduce an additional level of complexity – as is always the case when a general problem (in this case map generation) into multiple constituent parts. The parts need to be composed together to do the task – something that was not necessary when it was all one thing. Of course this decomposition provides other benefits – the components are simpler and one can use different styling mechanisms for the same data – or apply a single styling mechanism to multiple kinds of data. Hence this is a tradeoff. Note that KML (Google) currently is a graphic presentation language (like SVG), a style description language (like SLD/XSLT), a geographic representation language like GML ..
GML has an object-property rule
GML provides a thin layer of semantics, namely the object property rule. This means that if you look at GML and you find an object, the children of that object (in the XML sense) are the properties of that object – no more and no less. The children are not sub types nor are they objects contained in the parent object. GML properties express attributes and associations (relationship) of the parent object. When you create an application object (e.g. Road) you are expected to follow this same rule. Properties of a Road are encoded in XML (GML) as child elements – hence:
So numLanes is always the numLanes(Road) or Road.numLanes. GML core schemas follow this same model. This means that a point in GML is not the minimal:
but rather the somewhat longer:
where pos is the coordinates of the Point. GML stays true as possible to the object-property model. Note that the object-property rule, like many things in GML is borrowed from RDF.
GML is written in XML Schema
As we noted above, an early design decision in GML was that it must be inherently extensible, and that such extensibility should come from an external schema language and NOT from GML itself. In GML v1.0, both DTD and RDFS were provided as the schema languages. From GML 2.0 on we have chosen XML Schema. This rests on a few basic principles:
- We did not want to create a new schema language just for GML.
- We wanted something that was widely used.
- We wanted something for which there were many fast parsers.
From these requirements, XML Schema as selected. This is NOT to say that GML can only be expressed in XML Schema. In fact there is consideration of also providing GML in OWL or RDFS (once again). Someone may construct a RelaxNG version of GML. This would be perfectly valid. Of course this implies interoperability issues between one representation and another.
Much of the processing complexity of GML (and the visual complexity) derives from XML Schema. Some people will argue that another schema language will make things much simpler. I think this is not likely the case – at least of the schema language offers comparable functionality. Noneless the implementation of the GML model in XML Schema does entail that GML application schema processors be able to do certain operations that are not completely trivial – such as handle inheritance tracing or deal with substitution groups. For this reason, various vendors offer GML SDK's that hide these XML Schema details from software developers.
So while I would not call GML "simple" – it is what one might call appropriately complex!!