Print This Post

Convergence – the Structured Data Revolution

The past two decades have seen a phenomenal convergence of technologies.  Things once regarded as discrete mediums with quite distinct areas of expertise and technologies have, almost overnight, become almost one and the same.  Take telephone, television, radio, and photography, for example.  Fifty years ago, these were all completely separate technical domains with little in common other than that they exploited different parts of the electromagnetic spectrum.  Today, it is all simply about the transmission and processing of data.  Of course, there are specialized data transformations and encodings, but these are far removed from the underlying physics of the mediums involved.  This is a revolution of unprecedented scale in the history of the world.

It should be noted as well that this first convergence is about so-called unstructured data, meaning that while the data does have internal structure, this structure has little to do with the information content, and is all about the mechanisms for data transmission, presentation, and distribution.  We might call this “Convergence Part I – Unstructured Data”, as I believe we are entering into a new era of convergence, one that I will label “Convergence Part II – Structured Data”.  While, one might have thought that these would proceed in the opposite order (structured data was already data), the issues with structured data are actually more complex, as they center on meaning.

Where might such structured data convergence be taking place?  Can we learn anything from the history of unstructured data convergence?

It might be helpful to look at what the convergent technologies in Part I (prior to convergence) allowed us to do.  Television enabled the wide area transmission of moving pictures.  Radio enabled the transmission of audible sound and speech without wires and over large areas.  Telephone enabled the transmission of speech (via wires) from one person to another, while photography enabled the capture and recording of visual scenes.  We can see the Part I convergence in terms of enabling the acquisition and transmission of information so that it could be reconstituted for remote presentation.  Convergence meant that the individual networks established for these purposes for each of the separate technologies (e.g. telephone, radio, etc.) could be used to carry other kinds of information (e.g. pictures), eventually resulting in a more or less single network able to carry all of the different kinds of media.

What might we anticipate for Part II?  What sort of convergence are we talking about here?

My claim is that this second convergence is all about the management of information concerning events or projects (a project is simply an event of long duration), whether that is an aircraft landing, the segmentation of airspace to enable a public airshow, urban renewal in the inner city, the construction of a new highway, a G20 summit or Olympic Games event, a terrorist explosion, or an environmental catastrophe caused by an oil spill.  For all of these events, we can exploit common technologies and standards that enable the structured description of the event or project, controlled communication of the information to other participants, integration of that information into the participant data stores and applications, and presentation of the information in a manner that is useful within the context of the event or project.

Note that a structured data convergence implies that we are exploiting common tools and technologies that are aware of the “meaning” of data and not merely its structure.  This is the essence of structured data.  For many, this will immediately bring forth visions of ontologies and a reasoning infrastructure.  I, too, believe we will get there, but that this is another generation away.  For now, we should be working with ontology “lite”, meaning schemas, classification hierarchies, and associations.  More will come later.  Even with these few items in place, an enormous amount of convergence is possible.  We can have common notions of a data advertisement, subscription, and publication.  We can define standard means of encoding geometric and topological information, observations, and authorized features.  We can define generic interfaces for insert/update/delete that work across wide area networks and against a spectrum of datastore technologies and vendors.  We could do all of these things using different approaches in each area – one approach for air traffic management, another for the management of electrical generation and distribution, another for urban planning and transportation, and yet another for public safety and security … OR we could move along the path of convergence.