Data wrangling is an 'unpleasant' business, claims Splunk

Machine learning a 'difficult' practice, .conf2017 hears

Machine learning is still in the hype stage as actually leveraging the technology is still a challenging feat, especially where data wrangling is concerned, delegates of Splunk's .conf 2017 heard in Washington, this week.

During the event's opening keynote, Richard Campione, chief product officer at Splunk, noted that machine learning is getting a lot of buzz these days, but still has room to grow when it comes to adoption.

"Why don't we see it everywhere? Why is it mainly hype right now? Because it's difficult to do in practice. This is the challenge we're taking on," he said.

According to the executive, the first thing needed for machine learning to "be useful" is to wrangle data, which he said is "as hard as it sounds".

He pointed to a Splunk survey that found that 80 percent of data scientists' time is spent wrangling data, with over 70 percent saying this is the worst part of their job.

"These guys tend to be reasonably expensive people, so this is a time consuming, unpleasant and expensive business," Campione said.

This week, Splunk announced the Machine Learning Toolkit, a data science application used to predict future IT, security and business outcomes. Campione noted the offering includes a data prep feature to make data wrangling easier and work more pleasant for data scientists.

Campione also emphasised the importance training holds in the machine learning space.

"Machine learning, at the end of the day, is just math. Any old textbook has this in it," he told delegates. "That's not the magic. The magic are the coefficients that make it all work, and those are all different based on your environment, your business, your assessment of what kind of incident is detect-worthy [and] your assessment of how severe is that incidence. And you need to train based on the data, your environment and your business missions."

Driving value is also essential in the world of machine learning, Campione noted, adding that channel partners are able to use machine learning for "curated" experiences for customers.

Splunk also announced various new machine learning capabilities through two new versions of its technologies. Splunk ITSI 3.0 unites service context with machine learning to help locate existing and potential problems, restore business-critical services and bring analytics-driven IT operations, and Splunk UBA 4.0 adds the ability for users to identify insider threats.

What could be next for Splunk?

When asked what Splunk can do to aid channel partner businesses in regard to machine learning, Atif Ghauri, VP of customer success at Herjavec Group, pointed to the possibility of workflow automation, noting this is a hot topic in the security community, especially with security operations centers.

"There are questions as to whether or not that should be in Splunk. Should that be part of a platform or should it be part of an external platform?" Ghauri said.

"Where should that live? Splunk has capability to do that today, but to really get deeper into it would require further integration into the alerts. For example, 'if this type of alert comes out, it's going to require these seven response procedures, specific to that alert'. Right now, the workflow and determining which steps to take to correct the issue happen manually."

Outside of machine learning, the executive also pointed to the possibility of Splunk serving as a "caterer" for threat intelligence.

There's a big challenge in the security community of sharing threat intelligence and indicators of compromise, and everybody wants to protect their own assets. But if there was a third party that can incorporate this data on behalf of the larger community, perhaps it can be built on Splunk," Ghauri explained.