Print This Post

Data Virtualization

Originally published on LinkedIn:

Tagging and Organizing piles of papers into a Registry

Over and over, in many long-established industries such as oil and gas, we see the problem where a piece of data that could save a project millions of dollars failed to reach a project team. Most often this is because that piece of data was buried under a mountain of other Microsoft Word documents, spreadsheets, relational databases, PDF files, and text files that belong to one or more closely-related projects.

Is there a way to discover this information without assembling all these data sources from many physical locations into a single database? Huge investments may have already been made that depend on the physical location and current format of these data, and it may be too expensive to discard them.

A modern Registry with data virtualization might just be the answer. But what is a modern Registry?

A modern Registry is a data-store that communicates with its client applications in terms of business objects. These objects may have associations between them, they may be classified in different ways, may be organized into collections, and may have different life cycles. The Registry lifts the burden from client applications having to refer to and query data in terms of SQL tables and columns.

Data virtualization is the transparent reference to data stored externally from the Registry. Objects in the Registry can act as a proxy to this external data, freeing the client application from having to know and implement the method to retrieve the data. The Registry does it all. This is Data Virtualization.

A virtual information model is created in the Registry, where each object in the model is a proxy for an actual data item, such as a spreadsheet, Microsoft Word document, or a result set from a query on a relational database. The Registry can be configured to automatically extract and store the metadata from the actual data items, then automatically classify and associate proxy objects and make it easy for client application to discover and request data items through the Registry.

The key benefit of Registry Data Virtualization is the ability to bring together the data from all the different sources and in all their different formats so that these data assets can be discovered and reused.