Machine Learning and Threat Detection

Posted by Mark Greisiger

A Q&A with Jim Leonard of InfoArmor

One of the newer and potentially more promising weapons being deployed in the battle for cyber security is machine learning, in which systems can improve themselves based on experience and previous data. We asked Jim Leonard, Director with the Advanced Threat Intelligence unit at InfoArmor for some insight on this technology and how it might help mitigate threats.

The general idea is to train machines to handle, review, and manage big data (enormous volumes) in order to identify anomalies which can be performed against any data set with any criteria.

Machine learning is often called AI (artificial intelligence). How do these two differ?
While many struggle to decipher a distinguishable difference, I think of machine learning as the building blocks of AI, and AI as a machine capable of performing tasks with characteristics of a human. Once trained and provided with parameters, you can train a machine to recognize a chess board, the pieces, their movement and the probability of a player’s chance of winning—all based on moves made during a game. AI would be you playing a game of chess against the machine.

How is machine learning being used today for cyber security?
The general idea is to train machines to handle, review, and manage big data (enormous volumes) in order to identify anomalies which can be performed against any data set with any criteria. We see it being used mainly for anomaly detection and harvesting/ deciphering significant amounts of data in a multitude of formats.

Where do you get the data to feed these programs and how are you ‘training’ these machines to detect threats?
From a threat intelligence perspective, open source intelligence environments offer a tremendous amount of data. Our examples would include “seen before” data posted in open forums on the dark web or social channels. We also have “some” machine learning in maintaining status in those forums, but the difficult work of gaining and maintaining access to the lion’s share of dark web forums must be managed through human intelligence.

How accurate is the system in detecting threats? What about false flags?
It’s difficult to say as this is a new field. A great challenge in leveraging machine learning in a threat environment is identifying what is good vs. bad when you begin. If you don’t, the machine could easily assume threat traffic is acceptable.

What are some limitations or setbacks in using machine learning as opposed to relying on a human?
The short answer is “unknown.” The long answer, from our perspective, is that threat actors spend an inordinate amount of time securing their environments, harnessing the same tools we do, and don’t trust initial communications. It’s difficult—I want to say impossible, but who knows?—for a machine to understand the local slang of a foreign language and have the ability to communicate with or like a threat actor.

What lies ahead with this technology? Do we anticipate other applications beyond just detecting threats? 
It’s pure speculation at this time. It’s clear that machine learning holds great promise in terms of its ability to consume tremendous amounts of data and make autonomous decisions while involving less human interaction. That said, the movie WarGames came out in 1983 with a premise of artificial intelligence managing the U.S. nuclear arsenal and in the film this arrangement brought the country to the brink of global thermonuclear war when all Joshua wanted to do was play a game.

In summary… 
We want to thank Jim Leonard for offering his expertise-based thoughts on this important topic. I have personally known Jim for many years and he is truly an industry thought leader on a variety of cyber risk management topics, and great at sharing thoughts via social media. He has the unique ability to present esoteric-laced topics in layperson-friendly ways.

One thing we hear many clients struggle with in cybersecurity is threat detection, and specifically, how a small information security team can efficiently, consistently and effectively wade through a mile-deep pool of threat data. Cyber risk insurance carriers and their client risk managers want to have a comfort level that their IT folks have reasonable safeguards in place to mitigate threat exposure. Hopefully, ML/ AI can assist here as the technology matures. As Jim points out, a major challenge for the technology to overcome is the various language issues—I myself have a hard time following some of threat feeds with industry jargon and internet acronyms that can be 2F4U. Keep an eye on this space to see how it evolves.