Print This Post

Capture Features Early in the Food Chain

One of the challenges that we all face when introducing feature models is that the source of much of our data may not be very feature oriented. Because of the heritage of digital mapping, many organizations (yes, even today) are collecting geometry and annotation (text) elements, and then trying to construct features (i.e. models of real world objects) after the fact. If you are a data integrator, you may have to mash together several of these “feature less landscapes” using geometry and topology matching, or other inference mechanisms, to try and make features. This process is usually error prone, overly rigid, and labour intensive.

The situation is even worse when you want to do an update, since you cannot know beforehand which feature (or even which bits of geometry) the update applies to. The situation is improved if there are unique identifiers for the geometry elements, but would be immeasurably better if there were actual feature types, feature instances, and unique feature identifiers.

Of course if the aggregator says that to a supplier, and the supplier is able to comply, that may only move the problem farther upstream, as that supplier may also be a consumer and aggregator of “line work” from someone else. Ultimately, the problem can only be solved by capturing features at the source, meaning directly in the software/systems from which features arise, such as GPS tracks, interactive crowd source editing (á la OSM), digitization of features in orthoimages or stereo models, and the extraction of features from LIDAR. The key thing is that we need to have access to the desired feature types to be populated at the point of data capture.

A consequence of this viewpoint is that feature type hierarchies and feature catalogues (abstract models of features as objects with typed properties, and even feature schemas) must become much more public objects, and must be shared by both the data producer and the data consumer. Major data users (e.g. municipalities, airports) need to carefully define and publish their schemas and their data dictionaries on the Internet, together with the schemas for global feature identification, preferably using a Registry Service such as Galdos INdicio™. Data capture contractors can then use these feature schemas to directly capture the desired feature instances, regardless of the raw data or analytical methods used.

Of course, publishing feature schemas naturally leads to having standard feature models, which we are now starting to see in things like CityGML and IFC for large scale information, and with national standards for smaller scale data. In fact, it may now be the case that standard feature models are more advanced for large scale data than for smaller scale data. This makes a lot of sense, since the headaches of “line work” to feature “conversion” are immeasurably greater with 3D geometry and topology and large scale information.

It is worth noting that, for feature capture at source to be effective, it is not necessary that all of the properties of the feature be captured. The most important thing is that the geometric description of the feature be captured, and that the feature be assigned a unique identifier. This is particularly important for ensuring a modicum of topological and semantic integrity if feature instances share geometry with one another that must be captured, especially if, by definition, two features share a common boundary. These types of constraints should, of course, be part of the feature model.

If, today, you still live in a “feature less” world, it is time to push back – and push feature creation back to the source. You can do your part by creating and publishing your data dictionary and feature schemas online. Capturing features as early as possible in the food chain will simplify life for everyone.