Data scientists and explorers

What do big data professionals need to be able to do? Brian Gentile summarises the skills

The big data trend has done more than introduce a tidal wave of aggressive marketing campaigns. While it's the "big" in big data that has increasingly been rather loosely applied, it has changed how organisations make decisions. And playing into that, we have relatively new and critical job descriptions.

One is what I like to call the data scientist, because such data volume, variety and velocity require professional data management, data mining and modelling expertise, with emphasis on the statistical and predictive analysis.

A data scientist also needs experience working with multi-structured types of data. These new skills can be learned through vendor training, or simply by rolling up the sleeves and building a pilot project that uses some of the newest big data technologies.

The next most important job at our company is the data explorer. This role spans several analytic skill levels, but we use it to describe someone with critical business domain knowledge without which gaining new insight from Big Data is not possible.

Taken together, these roles can function to unify business analysis and IT.

The data scientist models and analyses the data in a wide variety of ways. He or she may not know what questions to ask of the data before analysing it, and the most valuable discoveries may be working out the relevant questions.

The data explorer is most interested in iterative discovery, probably on more constrained data sets better suited to specific data-driven business decisions. In other words, he or she can help answer pre-defined business questions.

Once live click-stream data or whatever is captured and plugged into a platform such as Apache Hadoop's HBase or Apache Cassandra, both scientist and explorer can get to work. The data scientist may help prepare the data for use, perhaps using Apache MapReduce or a traditional ETL tool.

The data explorer is then in a position to use an analytic or reporting tool to access, probe, or analyse the data.

Analytic and reporting tools designed for working with big data are becoming quite powerful and easy to use even for the data explorer.

Many articles and discussions talk about the skills shortage among data scientists. But what is not talked about enough is the skills shortage among data explorers. By this I mean that every business person should possess sound analytic skills, if they are to thrive in this new, information-driven economy.

Many executives today do not possess an adequate analytic skill set, so this skills shortage will soon be a bigger overall problem that must be solved.

The tertiary education of people who will ultimately be employed in business functions should include a lot more analytics and information-based decision making.

Creating more data scientists and data explorers will assist the continued growth and success of Big Data projects.

Brian Gentile is chief executive at Jaspersoft