Visualisation of big data
Different advantages accrue from different type of data with visualisation tools, notes Rupert Ogilvie
Data visualisation is key to exploring and communicating relevant information based on big data, through the use of graphic representations. These can enable collaboration, help users infer connections, and help them draw conclusions that benefit their businesses.
Big data is about volume, variety and velocity of data. Standard data management techniques may be appropriate if just one 'V' is involved. For example, enormous data sets can be handled elegantly using properly configured relational databases, and variety and velocity can be handled by good process management and conventional business intelligence (BI) processes. However, all three Vs are converged in big data management.
Cloud computing is often talked about in the same breath as big data. It is important to remember that cloud does not necessarily mean public cloud such as Amazon EC2 or a SaaS service such as Salesforce.com. There are also private and hybrid cloud offerings hosted on internally shared platforms.
If real-time feeds providing data suddenly expand in volume due to an external event, cloud technology can be used to provision and use resources quickly enough to minimise the risk of data loss. All data can in theory be stored in the cloud, while the organisation using the data can choose how much it needs to pull back for presentation and further analysis.
This flexibility of resource use can be a problem when planning an upgrade path or budgeting for their next cloud bill.
Visualisation of this data can obviously help understand what was used and when, as well as track trends in data use over time for future-proofing purposes.
Data visualisation is all about telling a story, and big data visualisation is no different. Tackling big data means billions of data points can to be woven together to create business stories.
Users can also use such visualisation to spot anomalies and outliers or view data from many different sources using a common framework.
You can cut into and move around the data at a granular level. Data which previously would have been discarded may be used to answer questions and perform deeper analysis. Data can be formed into subsets and groups, reducing the data density, and allowing for rapid summaries of different sections of the data.
Visualisation can show the state of a network or process at a single point, as well as stream the data in real time. Using advanced visualisation techniques it is possible to replay or rewind data to look for the cause of problems and track changes over time.
If a process has multiple inputs from different data sources, it is possible to see quickly whether the inputs in a process are being updated with sufficient regularity.
When planning new processes and thresholds, the ability to view the velocity of the needed sources can provide insight into the amount of work required to scrub and clean the data.
Visualisation can be a kind of template that allows an organisation to have confidence that as new data sources become available, they can be fitted seamlessly into the overall framework. Customers can overlay and combine data from different sources in different views at different levels.
Rupert Ogilvie is a consultant at Intergence