Danger in front of the screen
User error puts organisations at risk but few handle such problems correctly, says Paul Le Messurier
Picture this scene: the normally busy production lines of a successful food manufacturer stand idle while 20 delivery vans are lined up outside with nowhere to go.
The factory ceased to operate when its main IT systems failed, due to user error. The commercial pressure was tremendous because the company was unable to meet service-level agreements (SLAs) about delivering ingredients on time.
Luckily, the problem could be fixed remotely and the manufacturer was back at work within hours.
Large organisations rely on interconnected systems; problems caused by faulty hardware or user error can interrupt trading, creating an impact throughout their supply chains.
Yet most problems are caused by what is in front of the screen, not behind it. User error including accidental deletion of virtual files and servers, or malicious damage of systems, can bring them to their knees, even if they have the safest datacentres with mirrored server rooms a mile apart.
If they outsource all this responsibility to the channel, they expect the reseller to solve the problem quickly. But how many partners are actually up to scratch when it comes to data recovery capability?
In one case we worked on recently, a bank with operations in multiple European countries had done some routine maintenance work on its main transaction server. The server failed to reboot simply because a link between two clusters was not closed down during the maintenance work. Backup data on another server failed as well, and it took two days to restore the server.
In another case, the entire flight plans and costings of a national airline were affected by user error. About a month's worth of data was not backed up when routine maintenance destroyed the second part of the span sets; there was particular difficulty as the data was spread over two SAN systems.
There were important associated snapshots missing, but we were able to carry out virtual data recovery on a large 3.6TB NTFS HyperV server and restore defective files that made up approximately eight per cent of the total data on the system.
What can large organisations can do to address the risks associated with user error? This is not a new issue and organisations rarely have to call us twice for the same problem.
Yet such incidents are having more impact across organisational supply chains.
Problems occur for many reasons, but most relate to poor internal governance. While software vendors respond to market requirements for technology that is cloud-ready, easy to install and based on virtual platforms, this also increases the risk of accidental deletion of files, volumes or even virtual drives for companies that don't have the right policies in place.
Simply backing up corporate systems in this environment is not enough. They need adequate data recovery plans, and to invest in skills and tools that ensure everyone is aware of potential user-error problems.
These skills, however, are becoming more expensive to find or acquire as technology becomes more complex. IT budgets are smaller and young in-house teams may not have the experience to know how to retrieve data when user error occurs. Two-day training courses normally cost about £3,000.
Organisations that fail to give the right guidelines to their users and IT teams may well fail to meet deadlines or SLAs, communicate with clients or suppliers, or carry out financial transactions. The channel can add value here.
Paul Le Messurier is programme and operations manager for data storage technologies at Kroll OnTrack