Print This Post

Why GeoPortals are neither an SDI nor a Cyberinfrastructure

I have seen many instances lately where an organization starts down the road of developing a spatial data infrastructure (SDI), or a broader cyberinfrastructure, and ends up deploying only a GeoPortal. While the cataloguing of data sets and services is of importance, it hardly constitutes an SDI and, depending on how it is implemented, may not even be a good first step.

A 2006 ECAR study defined cyberinfrastructure as the coordinated aggregate of “hardware, software, communications, services, facilities, and personnel that enable researchers to conduct advanced computational, collaborative, and data-intensive research.” Harvey Blaustein, with Sandra Braman, Richard N. Katz, and Gail Salaway, “IT Engagement in Research” (Roadmap) (Boulder, CO: EDUCAUSE Center for Analysis and Research, July 2006), p. 2.

We extend the term “cyberinfrastructure” to refer to an assembly of software, with any associated hardware, that facilitates information sharing/exchange and supports the development of applications over a wide area network and involving multiple jurisdictions. If the cyberinfrastructure is primarily concerned with the sharing/exchange of geospatial information, we shall call it a Spatial Data Infrastructure (SDI).

This blog looks at the functional and architectural requirements of a cyberinfrastructure for information sharing, and compares these requirements with what is offered in a typical GeoPortal.

A typical GeoPortal provides functionality for the registration and discovery of metadata about services and data sets. Some GeoPortals are able to provide metadata only concerning web services, and are thus suitable for the discovery of web service metadata only, such as Capabilities documents, and WSDL documents. This is great to answer the question “what services are available in my cyberinfrastructure”, but much more is required.

To begin with, any cyberinfrastructure will require the ability to manage a wide variety of artefacts, which may include Application Schemas, Coordinate Reference Systems, Software Library Component, Simulation Models, Units of Measure, Currencies, Global Parameter Lists, Application Programing Interfaces, Organization Business Units, Web Service Interfaces and Dataset Descriptions, and Citizen Information. The variety of such artefacts across government is essentially unlimited, and the character of the artefacts is arbitrary.

This wide range of artefacts requires that there be a registry supporting a full data model. A simple set of fixed artefact types, as is the case in a typical GeoPortal, is not flexible or comprehensive enough. A registry would provide a data model containing objects with properties, taxonomies, relationships, and logical collections, allowing virtually any artefact type and its relationships to be modelled.

Of course there is more to a cyberinfrastructure than the registration of artefacts, no matter how arbitrary; the governance of these artefacts is also critically important. The rich variety of artefacts, and the fact of dealing with multiple, geographically-distributed jurisdictions, requires that each artefact be assigned a globally unique identifier. Furthermore, each artefact must be assigned a set of life cycle states, and the list of life cycle states must be made dependent on the artefact type because the life cycle states appropriate for Citizen Information are rather different than the life cycle states needed to manage Business Units, or Components in a Software Library. Also, the life cycle state is an important determinant of whether a piece of data or metadata can be consumed by a given user, and it may often be a component of an access control policy to regulate access to an artefact by different user roles. Typically, GeoPortals provide only fixed sets of life cycle states, and fixed user roles. This is hardly applicable to the management of information sharing across a state, or even within a single large organization.

In any cyberinfrastructure, it is more or less assumed that data access and data processing are accomplished by some form of web services (e.g. SOAP, REST, OGC-Style, etc.). The GeoPortal often functions as the catalogue of these services, but we have already documented the shortcomings of a typical GeoPortal in this role, even for the simplest such cyberinfrastructure. However, there are still more severe short comings that have to do with the wide-area integration of databases, web services, and applications. Here, the Geo-Portal offers no support at all.

A general cyberinfrastructure for information sharing must provide a messaging infrastructure that supports secure, fine-grained data delivery from a producer to a consumer, using a store-and-forward or publication-subscription model. Such an infrastructure would ensure the synchronization of databases, web services, and applications, and would accomplish this with minimal use of network bandwidth and other machine resources. Simply providing a limited catalogue of services, such as offered by a GeoPortal, is insufficient. Suppose your personal information world told you that there were services of interest that needed your attention, such as the arrival of an email, but left it up to you to figure how to make the connections and ensure the retrieval and delivery of that email. That would not be a very satisfactory situation.

In summary, the typical GeoPortal provides limited support for an information sharing cyberinfrastructure, and a general registry is a much better starting point. Furthermore, a GeoPortal provides no support at all for wide area data synchronization, and relies entirely on the participant client software to make data requests, and to integrate the retrieved data into their own environment.