Machine Learning with Humans: Integrating Experts Into the Learning Process Machine Learning with Humans: Integrating Experts Into the Learning Process
Editor’s note: Adam is a speaker for ODSC Europe 2022. Be sure to check out his talk, “ML with Humans: Integrating... Machine Learning with Humans: Integrating Experts Into the Learning Process

Editor’s note: Adam is a speaker for ODSC Europe 2022. Be sure to check out his talk, “ML with Humans: Integrating Experts into the Learning Process,” there!

Active learning is a process in which, as part of the machine learning workflow, experts are asked their opinion about predictions made by an ML model to tune and improve the model. This method is usually used when there is a considerable amount of data, but only a relatively small portion of it can practically be labeled. The process goes as follows:

  1. At first, an initial model is trained with a small amount of labeled data. This model will have a relatively bad performance
  2. The model then classifies the remaining unlabeled examples, which usually consist of most of the data, and the confidence level per prediction is recorded
  3. The predictions with the lowest confidence are sent to the expert for labeling
  4. The expert’s classified examples are fed back to the model which retrains on them and as a result improves its performance
  5. Steps 2-4 continue until the model’s performance is satisfying

With the process of active learning, you shrink down the number of examples that experts need to label. Hence this process is extremely useful in cases where labeling data is hard, expensive, time-consuming, and requires an expert’s opinion.

Integrating DDoS experts into the learning process

Distributed Denial of Service (DDoS) attacks are based on overwhelming a victim’s resources such as the website server’s CPU or network bandwidth. This causes the resource to be congested, preventing anyone from using it, which means the server can no longer provide any services. This attack is considered easy to execute and is one of the most popular attacks being studied in the cyber security domain.

To fight back against DDoS attacks, Imperva applies security logic on the network traffic that differentiates between benign and malicious traffic. Since this logic requires computational resources, it is turned on only when an anomalous amount of traffic is detected. To detect anomalous surges in traffic, Imperva uses a large set of values, called a security policy, that represents what normal amounts of traffic should look like. In the past, our security experts used to create a security policy by looking at the network traffic and using their experience and heuristics to create the best security policy. This complex and demanding process was repeated for each of the thousands IP ranges we protect.

To solve this problem, we trained an ML model that learned how to create security policies like our experts. Although this model can predict a correct security policy in most cases, whenever the model is uncertain of specific prediction accuracy, it marks it as risky and sends it to be reviewed by a security expert. This enables experts to spend less time on creating security policies and focus on other important matters such as researching cyber-attacks, while still maintaining control over the security policies.

Mitigating DDoS attacks by YOURSELF!

During this workshop, the participants will learn how to implement an active learning process. As part of the hands-on tutorial, we will use the modAL which is an active learning framework for Python3. We will help a model to determine if a security policy during a DDoS attack was useful or not.

We will write code that will implement this cycle of active learning:

Fig 1: Active learning cycle. Image from modAL

Iteration by iteration the participants will see how the model knows to ask the best questions (i.e., the examples with the highest uncertainty) and how its performance improves with iterations from which it learns.

Fig 2: Left: DDoS attack and the security policy. In this case, the security policy succeeded in catching the attack. Right: Model accuracy improvement along with iterations with an expert.

About the author/ODSC Europe 2022 speaker:

Adam Reichenthal, PhD is an experienced Data Scientist at Imperva’s threat research group where he works on creating machine learning algorithms to help protect Imperva’s customers against database attacks. Before joining Imperva, he obtained a PhD in Neuroscience from the Ben-Gurion University of the Negev.

ODSC Community

The Open Data Science community is passionate and diverse, and we always welcome contributions from data science professionals! All of the articles under this profile are from our community, with individual authors mentioned in the text itself.