Integration in a time of data pollution

Graham Oakes calls for greater care in data handling

Oakes: Integrators must take care to minimise data pollution

IT security guru Bruce Schneier has said that data, the by-product of every computer-mediated interaction, is the pollution of the information age.

It stays around forever unless disposed of. In an essay published in February called The Architecture of Privacy, Schneier said: “Just as we look back at the beginning of the previous century and shake our heads at how the titans of the industrial age could ignore the pollution they caused, future generations will look back at us.”

I have spent much of my life trying to integrate data. Customer data in one system needs to connect to product data in another so we can deliver a decent service.

Plans in one system need to connect to specifications in another so we can build the right product. Or, as often as not, customer data in a dozen different systems needs to be linked together so we can simply count how many customers we have.

Such integration has always seemed valuable. By enabling staff to see the big picture, we helped them to make better decisions, to work together more effectively, to eliminate duplication, and so on. It wasn’t always easy – but it was valuable.

It is becoming clearer that there are larger costs to data integration, beyond the costs of data cleansing and transformation, of integration hubs and the like.

Data integration creates serious concerns for customer and user privacy, both by increasing the amount of personal data that is accessible in any one place, and by increasing the ease with which third parties can access this information.

What is valuable for a company that holds the data may threaten the livelihoods of the people that the data describes.

Data integration also dramatically increases the impact of data losses. Each week brings fresh news of such losses – discs go missing in the post, laptops are left on trains and credit card processors are hacked.

This is a serious concern. We can tighten our information handling processes. We can train our people better. However, mistakes will always happen.

No system is infallible. Integrated data just increases the cost of these failures.
As we integrate data, we don’t just make it more valuable; we also make it more vulnerable.

We cannot ignore these costs any more. It is no longer enough to seek ways to integrate data across multiple systems.

Even as we do this, we need to design and build in ways to partition that data so it cannot be connected except by authorised people and for purposes for which we demonstrably have consent.

Data pollution is like the worst type of chemical smog. Each leak doesn’t just add to the smog, it interacts with what is already out there and multiplies the effects. It is time, I think, that we started to manage it seriously.

Graham Oakes is an independent IT projects and strategy consultant with a Ph.D in geophysics and remote sensing. He is also a chartered engineer and fellow of the British Computer Society.