Some things in life always take a lot longer than one would expect. Twenty five years ago, I was involved in developing spatial data infrastructure architectures in various parts of the world. At the time, key important geospatial technologies included conflation and geometric matching, spatial indexing, and data format conversion. Today, one could argue almost the same thing, in spite of the immense progress that has been made in information technology in general. Somehow, the geospatial world is still locked in file transfer, format conversion, and the tedious post-conversion data integration tasks such as conflation. A police department acquires the road network from the city and defines policing districts by road and street boundaries. Sensible enough, except that they did not get good data in the first place. Some of the roads have changed, and the districts were defined by just drawing them as an overlay on the road network rather than by referencing them to the actual road segments. The result is districts that don’t match the true roads at all, meaning that using the data for crime analysis and other such purposes simply can’t be done. The normal solution to this problem is to conflate the district data (district boundaries and associated data) with the current road data, using the new road geometry and the existing police district data; this is accomplished by geometric matching.
All well and good, but why are we doing this at all? Why were the police districts not defined relative to the actual road segments (not as a drawn overlay), and why is there not an information infrastructure in place that automatically provides the police with the latest road information, thus automatically updating the police districts in the process? It is the lack of such infrastructures that makes geospatial data integration such a difficult and tedious process.
Of course, you will argue that if the police and the road supplier use the same GIS software, all is good. Well, of course, this is not really practical since the police may be a national organization, while the road supplier is only a local one. Furthermore, and for much the same reason, the police and the road supplier have different road models, so the problem exists even if the same vendor software is used. This is simply not a viable general solution.
Another argument is related to getting started. How does the police force migrate to this automated infrastructure solution? To begin with, the police force needs to define the police districts in terms of road and street segments (and possibly other features), and not just as a visual overlay. This favours having a copy of the road geometry data and an identity management mechanism that enables the district features to be defined by referencing them to the road/street segments. Most GIS and spatial databases support such mechanisms. The only missing point, then, is to have the police subscribe to the road segments from a suitable road supplier, which may also require data transformation (e.g. coordinate conversion and schema-based transformation). These are the functions of the supporting information infrastructure. Once this is set up, the police agency has no more work to do and automatically remains in synch with the road provider.
If we see data access as really a process of business process integration, then many of the problems associated with ad hoc file transfer or ETL (extract-transform-load), especially those related to data integration, can be eliminated. Moreover, capture of metadata is much easier to automate since the data is inherently acquired in context. All of this helps ensure that the data subscriber has the most current, and most accurate, data available for their business processes.
Sometimes things move more slowly than we might hope but, slowly and surely, this view of data integration is taking hold.