Print This Post

Feature Equivalence and Canonical Representatives

According to ISO 19109 (General Feature Model), a feature is a model of some real world entity or phenomenon. Features can be concrete entities like buildings or roadways, abstract ones like municipal boundaries or even temporal events like the occurrence of emergency or other incidents.

A common problem in wide area data integration is how to determine which feature instances represent the same real world entity. Different data stores may use different means for feature identification, and for the data model and associated feature schemas. This can make it difficult to determine which feature instances are equivalent to one another or related to one another in some other way (e.g. “isAPartOf”).

We can say that two feature instances are equivalent if they are intended to represent the same real world entity, as long as we are referring to the actual instance of the real world entity and not just to its classification. The set of all feature instances that are equivalent in this way constitute an equivalence class.

Given an equivalence class of feature instances, we can hypothesize the concept of a canonical representative for the equivalence class. This can be understood by analogy to equivalence classes in elementary mathematics. We can consider, for example, the equivalence class of even numbers [2]. The numbers 2, 4, 18, and 36 are members of this class, and we can choose the number 2 as a canonical representative.

In the same manner, we can think of the equivalence class of feature instances and identify some particular feature instance as a canonical representative of the class. For example, we could use the CN Tower as the canonical representative of a collection of real world entities that include the tower itself, the restaurant at the top, the restaurant on the lookout level, the café at ground level, and the gift shop. We propose to do this by creating a registry of canonical representatives. A canonical representative would have a globally unique identifier, a name, a description, and possibly a nominal location, but it would not have other properties. (To read more about how this is achieved in a registry, see Classifications and Associations.)

The type of a feature instance should be determined by its classification according to a taxonomy of feature types, and not by the list of properties associated to the type. To make this clear, we shall refer to the taxonomic classification as the semantic type, and the type determined by a particular feature model (schema) as the schema type. Note that GML determines a schema type via an application schema and not a semantic type.

Each canonical representative is classified according to a canonical taxonomy of feature types that is also maintained in a registry. Two feature instances that are equivalent must have the same semantic type, this being the semantic type assigned to the canonical representative to which it is equivalent. These ideas are illustrated in the diagram below.

Diagram of Canonical Representatives

Diagram of Canonical Representatives
(click to enlarge the image)

A registry of canonical representatives could be established globally, or for a state, a province, or just a city. Feature instances in all other databases would then reference the global identifier of the appropriate canonical representative to which they are equivalent.

Since in many cases it will not be possible to modify existing databases, an alternative mechanism can be provided in which proxy objects are created in another registry that reference both the canonical representative and the feature instance for which they are the proxy, stating, in effect, that the selected feature instance is equivalent via the proxy to the associated canonical representative. This idea is also illustrated in the diagram.

A registry of canonical feature representatives could be a core component of any Spatial Data Infrastructure, leveraging a canonical type taxonomy based on ISO 19126.