Print This Post

Registries – Beyond Databases Part 2 – Built-In Data Governance

This is the second post in our series about Registries:

In Part 1 of this series of posts we suggested that, for a wide range of data centric applications, a registry was a better starting point than a database because it could greatly accelerate development and reduce time to market.

Some other useful features of a registry include:

  1. A powerful NoSQL data model supporting geography and semantics
  2. Automatic notification on changes to any data object
  3. Built-in automated data harvesting
  4. A plug-in architecture for data transformations
  5. Open standard web service interfaces
  6. A suite of data governance mechanisms

This blog looks in greater detail at the topic of data governance.

Built-In, Extensible Data Governance

Registries provide a variety of mechanisms for data governance. Rather than specify particular governance procedures (e.g. ISO 19135), a registry is designed to provide the mechanisms so that many such procedures can be realized. Developers and administrators can easily use these mechanisms to support whatever specific governance procedures they require. Note that the data governance discussed in this blog applies equally to service descriptions (i.e. SOA Governance) as to data descriptions.

Globally Unique Identifier

To facilitate object identification regardless of the value of the object’s properties, all objects in a registry are automatically assigned an application level globally unique identifier. This identifier is never changed over the lifetime of the object. Having a globally unique identifier as a property of every object in a registry provides the basic means by which all other data governance functionality is achieved.

Life Cycle States and State Transitions

The fundamental construct for data governance is a list of life cycle states. These are the states that the object goes through over its life cycle. For a software artefact these states might be {submitted, approved, deprecated, withdrawn}, while for a physical asset such as a building they might be {submitted, planned, approved, under construction, completed, operational, slated for demolition, demolished}. Administrators can create a list of any such life cycle states deemed appropriate, and assign them on the basis of a specific object type. Hence a ‘building permit’ can have different set of life cycle states than a ‘mineral lease’ or a ‘software library’.

In addition to having control over the list of life cycle states, an administrator can also specify the allowed transitions between these states. While it probably makes sense to allow an object to go from being “submitted” to being “approved”, it may not make sense to allow it to go from “submitted” to “deprecated”, since the deprecated state typically follows the state of being approved.

Administrators can easily create a set of life cycle states and state transitions in a registry to best model their data governance environment. The registry respects the states and state transitions, and any transaction that attempts to change the life cycle state of an object in an illegal manner will fail and be flagged with an exception.

Built-In Audit Trail

A critical component of any data governance system is the ability to retain a history of what has happened, including life cycle changes and any other changes undergone by the application objects. Registries support this requirement by providing an application level audit trail that records all changes to registered objects, recording for each such change:

  1. The object’s globally unique identifier
  2. The user (either human or machine) responsible for the change
  3. The operation (e.g. creation, update, delete, state change) that caused the change
  4. The date and time that the operation was performed

The audit trail records all changes to every registry object, including changes to an object’s classification, geographic properties, and life cycle states.

The audit trail exists at the application level where the object changes are immediately understandable in the application context. Changes to objects (e.g. modified land parcel surface area, or land parcel geometry), object relationships (e.g. land parcel has a new owner), and changes to the object’s classification (e.g. land parcel classification changed from “crown” to “private”) can all be immediately recognized on browsing the audit trail.

As this is an application level audit trail, it only makes sense that the auditing level of detail is also controlled by the application (on a per request or per transaction basis). Applications require fine-grained control of what needs to be audited and when it needs to be audited. For example:

  1. Turning off auditing when executing bulk operations (e.g. initial bulk load of millions of records)
  2. Providing fine-grained auditing for specific end-user operations
  3. Tracking exact changes of an object’s properties (i.e. keep track of previous values)

In order to provide better support for fine-grained auditing, registries use the concept of audit levels. The exact set of supported audit levels will depend on the registry implementation; however, most registries provide support for the following audit levels:

  1. No auditing
  2. Standard auditing: the audit trail contains core properties as described above
  3. Previous state: in addition to the standards auditing, persist the previous state of the record with all its properties
  4. Diff: in addition to the standards auditing, only persist the differences between two record states