Print This Post

Taming Unstructured Data

Originally published on LinkedIn: https://www.linkedin.com/pulse/taming-unstructured-data-eddie-yip

Registry with a computer and database

In the article “What Does IoT + Big Data Mean to You?” (http://www.ecnmag.com/blog/2015/04/what-does-iot-big-data-mean-you), ECN reports that by 2020 the estimated volume of digital data will be 44 zettabytes. That’s one trillion gigabytes! How we make use of all that data depends on how well we can manage, analyze, and process it into useful information.

Some pundits have said that we will be connected to any and every device, and will access all that data instantly with a tap on our smartphone. I, however, do not subscribe to this view. I see a world where there will be many networks of IoT devices. These networks will be owned by corporations (big and small) whose business is to gather raw data from devices and process it into useful information products for sale to other businesses and to individuals. Of course, there will be small, personal, IoT networks such as home security systems, but I’m referring here to networks with hundreds or thousands of devices.

The raw data from just one device may be useful in some circumstances, but it is the aggregate data from multiple devices that brings real value. A set of devices providing temperature readings over an area will yield a more accurate weather forecast once they are processed. Or a set of devices to measure the flow of shoppers through a supermarket can help a manager to place the right products in the right aisles to maximize purchases.

Raw data has a shelf life, a property that is often overlooked. If the shelf life is short, the value will drop to zero if the raw data are not processed into useful information in a timely fashion. For example: a late weather forecast for a destination is useless to a pilot who needs it to decide whether to take off or not.

Raw data is unstructured. Consider a set of readings over a given period of time. The number of readings can differ across different devices but they all have metadata, such as the ID of the device, the time the reading was made, the location of the device, the range of the device (another spatial property), and more. Metadata gives context to unstructured data. Metadata also makes unstructured data accessible because, among other things, you can categorize the data, locate the device on a map, show the measurement area (or range) of the device, and report on its working status. With good metadata management, you can quickly search for the unstructured data you need to create information products promptly for your customers.

Modern Registries (as I have described previous posts ) are ideal for metadata management because at their core is the OASIS ebRIM data model (http://docs.oasis-open.org/regrep/v3.0/specs/regrep-rim-3.0-os.pdf), which was designed specifically to manage services and their metadata. The ebRIM model defines a blank object called an Extrinsic Object. This object can be used to describe the metadata of any IoT device type on your network. I’ve highlighted the Extrinsic Object in the OASIS ebRIM Information Model Inheritance View below:

Information model showing objects that inherit properties from the Registry Object

An Extrinsic Object is defined as “the prime metadata class container for a Repository Item”. A Repository Item is any binary data that is persisted locally in the Registry, and which can be used to house metadata about an IoT device. The ebRIM standard defines an IANA mime type attribute for an Extrinsic Object’s Repository Item that will identify the type of Repository Item.

An Extrinsic Object (your device) inherits all the properties of its superclass (parent) Registry Object. These are:

Table of Properties for an Extrinsic Object

Notice the lid attribute to uniquely identify and retrieve the device record. You can bind your device to a specific objectType, where the set of object types is defined by a user-specified classification scheme. With the status attribute, your device can have a set of life-cycle states that are also configured as a classification scheme. Life-cycle status can provide information about things such as the planned maintenance of your devices.

As you can see from the Relationship View diagram below, your device can have additional attributes (Slots in ebRIM parlance). You can define as many attributes as you need, including one or more spatial attributes. This means your device can have a position (a point attribute) as well as a valid range (an area attribute). A Modern Registry can use these attributes to spatially search for other devices in your network.

Diagram showing the relationships between a Registry Object and other objects

A Modern Registry can use an ExternalLink as a means to access the location of a device and retrieve its raw data for processing. We call this novel usage of the ExternalLink feature “Data Virtualization”, where the Registry is acting as a proxy to the real physical location of the data.

The ebRIM model allows the Modern Registry to create many Classification Schemes, and a device can be classified under multiple schemes at the same time. Thus, a client applications of the Registry can allow different users to use their own classification criteria to search for the same physical devices.

A Classification Scheme is like a tag list but, in the ebRIM world, the scheme is a tree of tags or Classification Nodes. A Classification Node is also a subclass of Registry Object, which means it can have its own additional attributes and these attributes can be spatial. The Registry can use attribute values in the Classification Node to match any newly-created device object and auto classify the new device. This can help speed device classification and eliminate erroneous classifications, and hence improve data quality.

The flexibility of an Extrinsic Object allows you to define all your device’s metadata, no matter how different they are from each other, using the same ebRIM model construct. The ebRIM model makes it easy to manage, discover, and retrieve your device’s metadata and their respective unstructured data.

When you build your metadata management application based on a Modern Registry platform, you will get all the benefits of the ebRIM model without the burden of having to maintain the mapping between the ebRIM model and the physical data store schema. Eliminating the object model to physical data model mapping concerns from your application allows your application development team to create more relevant business object models, higher quality business logic, and more effective user interfaces.