Print This Post

The Architecture of the GeoWeb

In the run up to GeoWeb 2007 ( it is helpful to start thinking about the various interpretations of the GeoWeb. Some think of the GeoWeb in terms of the aggregation of Spatial Data Infrastructures. Others in terms of data supply chains analgous to physical supply chains for manufactured goods. Still others see the GeoWeb as a particular instantiation of the Semantic Web. All are valid interpretations. In this note, we want to take a more basic perspective, and compare on an architectural basis the conventional World Wide Web and the GeoWeb which of course builds upon the former. As we shall see there are may things in common.

The conventional web is based on text indexes as shown in Figure 1. Note that a Search Engine is in effect a large index, that responds to queries for words from a browser to which it returns text summaries and the address of the web site (HTTP Server) where the referenced web pages or files can be found. It is important to realize that the web content does not reside at the search engine, but rather "out on the web". The Search Engine is populated by a so called Robot or Web Crawler. This is a program which is given a set of starting HTTP addresses on the web, and which understands the structure of HTML documents. It "crawls" the web by reading each HTML page, creating a summary of the page content, then looking for and following any embedded links (which are again HTTP addresses). The Web Crawler continues in this manner until there are no more pages to traverse from the starting set of nodes. The browser uses the returned information from the Search Engine (e.g. Google, Yahoo) to construct a list of alternatives for the user, which are then used to make a request to a selected web site where the actual content resides. For this entire mechanism to work, we need only a few simple standards, namely HTML (document structure with embedded links to other web pages) and HTTP as the protocol for requesting and delivering information between the browser and the web site, and the crawler and a web site. Note that the search engine is itself a web site. This basic mechanism has over time been extended to include non HTML files (such as SVG, Word documents, GML, images etc.) and more recently of course KML files.


Figure 1. Conventional Web Search Architecture


The GeoWeb follows a similar model to the above except that it extends the picture to include the fact that a great deal of information has an associated location or extent on the earth's surface. Of course this spatial relationship can be relatively simple (e.g. location of a building) or quite complex (e.g. geographic references in the body of a document). For a GeoWeb to work, we need to extend the Search Engine's index to incorporate some type of spatial index such as an R-Tree, Quad-Tree or some combination. This enables objects to be indexed by their location (where on the earth's surface) rather than just by some keyword. Note that this can proceed in a couple of different directions. Document items may contain references to places (e.g. place names) which can then be located and spatially indexed. Other information may be inherently spatial in nature such as a GIS data set, an ESRI Shape or GML File etc. These can also be spatially indexed by an appropriate Spatial Robot, e.g. one that can construct/update R-trees by reading files on the Internet. Google has already implemented such a strategy using KML files. Clearly other spatial file types such as GML files, Shape files etc can be "crawled" and indexed in a similar manner. Of course a great deal (the preponderance) of geospatial information does not live as files at all (this claim can also be made for many other types of data as well), but rather in managed applications or in databases. To access this information the Web Crawler (spatial robot) will need to be able to somehow connect to these databases and determine the extent of the objects contained therein. This might be handled in some cases by the use of Web Feature Service (WFS) interfaces since a WFS is able to provide the extent (as an MBR) for each of the feature types that it contains. This would make the task of the Web Crawler particularily simple. A GeoWeb version of Figure 1. might thus look as shown in Figure 2.


Figure 2. GeoWeb Search Architecture


It is likely that all of the big search engines will move to spatial crawling in the near future. This will add a new and exciting spatial dimension to the web – making the GeoWeb a reality all that much sooner.



Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>