The problem of big data is an open-ended question without clear boundaries or definition. Yet it appears that open source may hold at least part of the key to some expansive, complex and fragmented business analytics projects.
Andrew Clegg, technical manager for data analytics and visualisation at Pearson Technology, said agile analytics offerings can be developed cost effectively by harnessing open source tools as much as possible. Open source offerings can combine a wide range of applications and data sources, supporting customer growth and business transformation.
This helped Pearson deal with the hundreds or even thousands of events per second, day after day, that it needs to log and analyse, as well as deal with constantly changing attributes and fields. It also simplified licensing issues, he added.
From whoa to go
“Have you ever had this conversation? ‘We’ll need to add new event types and new properties every so often.’ The DBA will have to create a table and an index, alter the table, add a column, add a foreign key, and rebuild the [online analytical processing (OLAP)] cube,” Clegg said.
“And then you hear: ‘Oh, and we’d like to see new events as they happen in near-real time’, and that they are really not sure how many servers they will need a year from now or how many people will need access to the reports. You need it to be scalable, flexible, agile and near-real time. That’s a lot of buzzwords.”
Schema-free tools support agile product development as well as empowering the end user and saving the technical team time. Pearson also advocated JSON web service interfaces to improve interoperability. “Incremental index updates beat overnight bulk loads,” Clegg added. “Hadoop gives us huge flexibility for complex batch queries. That is the answer, or as close as you can get for a work in progress.”
Neil Barry, UK country manager at BI vendor Jaspersoft, cited an open source project for promotions company Groupon. Groupon offers daily discounts on consumer products and services from a wide range of vendors via its email newsletters, customised for 45 countries. It has 10,000 staff and grew from nothing to 80 million subscribers in three years.
“They have a lot of data to manage - online marketing campaigns, offer information, customer data, finance data and so on. Microsoft Excel could not cope, which was what they were using. They had SAP systems and a big Oracle database and yet when you walked around the building, they generated reports with Excel,” Barry said. “That was going to hurt Groupon’s business.”
He explained that Jaspersoft developed a flexible, scalable global data warehouse for Groupon that complied with international data laws, using extract, transform and load (ETL) tools and BI built on open source, with reporting and OLAP. NoSQL, Barry said, handled 15TB of data such as mail and logs in 2012 for Groupon.
Twenty-five different data sources, in diverse media, from countries with different languages and notation, are combined and incorporated into 350 daily reports and real-time monitoring of marketing, finance, revenue, sales, subscribers, refunds, fraud, abandoned sales, special deals and so on.
“They are now able to examine trends based on time of day, location and the like, and answer questions such as: ‘What revenue was generated in the past hour from Facebook?’ Or: ‘What was the cost of that advertising?,” confirmed Barry.
Clegg and Barry were speaking at IDC’s recent forum on big data projects alongside the explosion in popularity of social media.
Gartner claims that organisations must consider opening up their data to even more analysis. Big data can make organisations smarter, as it were, but open data - using open APIs and linked data exchange - will make them more profitable and competitive.
David Newman, research vice president at Gartner, said in a recent announcement that the massive data volumes out there are definitely enabling businesses to discover patterns and insight. But organisations - and the providers of IT that support them - must go even further, he indicates in an announcement.
“However, for clients seeking direct interactions with customers, partners and suppliers, open data is the solution,” he wrote.
“For example, more government agencies are now opening their data to the public web to improve transparency, and more commercial organisations are using open data to get closer to customers, share costs with partners and generate revenue by monetising information assets.”
Vendor's announcements include AI-powered Microsoft Office, a move away from password verification and an alliance with Adobe and SAP
Vendor claims hackers are hijacking machines to mine for cryptocurrency
Nearly half of SMBs are planning to invest in digital workflows to reduce their paper-based processes by 2025, according to Quocirca
The charter has pulled together the biggest names in tech in an unprecedented attempt to address the tech industry's lack of diversity. Tom Wright asks how it plans to do it