Originally published on LinkedIn: https://www.linkedin.com/pulse/registries-missing-link-mastering-data-rob-sterpin 
There are a number of reasons why companies might find themselves with many different databases containing content in many different formats. Ongoing Mergers and Acquisitions activity is one way, having segmented business units is another. Regardless of how it happened, companies need to bring together some, if not all, of the information in these disparate databases in order to effectively conduct business and make good decisions.
For many companies, data is at the heart of their business. Either they provide data products to other businesses, or they place heavy reliance on their data during the course of doing business. Companies invest a lot of effort in building and managing their databases and making the data readily available to those who need it.
Why Integrate the Data?
In the modern age of interoperable systems, application mashups, and global data sharing, it is critical to be able to bring together data from multiple different sources. The reasons why, and the specific integration challenges that will need to be addressed, will depend on the purpose for which the data is being integrated.
In some cases, it may be necessary to bring data from different sources together to get an overall picture of what is going on. This is typically what is needed for a multi-agency common operating picture, or for business intelligence dashboards.
At the other end of the spectrum is when multiple different databases are merged together to create a new database. This is a typical problem faced as the result of a merger, or when a company acquires one or more other companies, and the need is for a single integrated database to support operations.
A registry provides a number of powerful features that can be leveraged in different ways within a data processing chain.
What are Some Typical Integration Challenges?
Even databases that look like they contain the same information can exhibit huge differences when it comes time to merge the data.
One of the biggest challenges is with the schema and data model of each source database. Not only is it likely that each database has a different underlying schema and data model, but there may also be differences in how the properties and fields are named. Quite apart from simple differences such as whether there is a single field for Name, or whether there are multiple fields such as First Name and Last Name, there may be variations in the property names such as First, First-Name, or firstName. Even more complexity arises when each database refers to the same thing by a different name, creating variations such as identifying the same place to eat as a Restaurant, a Diner, or an Eatery.
This challenge is compounded by the data itself. Setting aside all the myriad possibilities for data entry errors, the same data could be entered in more than one way. For example, the company name “Galdos Systems Inc.” could be entered as “Galdos” or “Galdos Systems” or “Galdos Inc.” or with the “Inc.” spelled out as Incorporated. Even if the data model and properties are correctly mapped, it is possible for this example to result in four different entries for the same company.
In the ideal situation, multiple entries for the same data are combined into a single entry, duplicate information is removed, and all unique information is retained. Some of this process can be automated, but it often requires human intervention to do a final clean up.
Additional challenges can arise depending on whether the databases are being merged physically to create a new database, or merged virtually leaving the data to continue to be managed in the source databases. Applications wanting to use the merged data may add an additional level of complexity if they have yet another data model and schema that must be considered.
How can Registries Help?
Registries provide a great deal of power and flexibility for solving some of these data integration challenges.
In the case of physically combining multiple databases into a single database, registries can be programmed with bridges between each source database and the new master database. Each bridge contains all the necessary mappings and transformations so that the data can flow through from one of the source databases and be ingested into the new database in the correct format.
In the case of virtually merging multiple databases, a registry would be integrated into the infrastructure or service bus that connects the source databases with the applications that use the data. Depending on the application and how the data will be used, the registry keeps the data updated and always in synch.
I will look in more detail at some of these challenges in future posts.