Detecting Adversarial Attacks with Subset Scanning Detecting Adversarial Attacks with Subset Scanning
Editor’s note: Celia Cintas, PhD, is a speaker for ODSC West 2022 this November 1st-3rd. Be sure to check out her... Detecting Adversarial Attacks with Subset Scanning

Editor’s note: Celia Cintas, PhD, is a speaker for ODSC West 2022 this November 1st-3rd. Be sure to check out her talk, “A Tale of Adversarial Attacks & Out-of-Distribution Detection Stories in the Activation Space,” there to learn more about detecting adversarial attacks!

Deep neural networks are susceptible to adversarial perturbations of their input data that can cause a sample to be incorrectly classified. These perturbations contain small variations in the pixel space that cannot be detected by a human but can change the output of a classifier.

Reliably detecting attacks in a given set of inputs is of high practical relevance due to the vulnerability of neural networks to adversarial examples.  These altered inputs create a security risk in applications with real-world consequences, such as self-driving cars, robotics, and financial services.

One way to classify adversarial attacks is by their threat models, of which there are two main types: white-box and black-box. In the white-box approach, an attacker has complete access to the model, including its structure and trained weights.  Several examples of white-box attacks are used in this work, such as Basic Iterative Method (BIM), Fast Gradient Signal Method (FGSM), DeepFool (DF). In the black-box approach, an attacker can only access the outputs of the target model. You can generate attacks and test detection and defense mechanisms in python with the Adversarial Robustness Toolbox.

Figure from [0].

Particularly, we will showcase how to use the Subset Scanning Detector. This method treats Neural Networks as data-generating systems and applies anomalous pattern detection methods to activation data. Subset Scanning can efficiently search over a large combinatorial space in order to find groups of samples that differ the most from ‘expected’ behavior and could contain an adversarially-attacked image. If you’re interested in the methodology, you can check out [1].

Some goodies about this type of approach:

  • We can provide attack detection capabilities at run time.
  • We can abstract from domains (audio, image, tabular data) and focus only on the deep representation of the input.
  •  No need to re-train or have labeled examples of the adversarial attacks during training or our detector.

The Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART is hosted by the Linux Foundation AI & Data Foundation (LF AI & Data). ART provides tools that enable developers and researchers to defend and evaluate Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference.

Below, we will show an example of how to use ART to build experiments around Adversarial Attack detection with Subset Scanning Methods. 

Make sure to install these packages in your virtualenv:

pip install keras adversarial-robustness-toolbox pillow tensorflow seaborn

For this example we will implement multiple steps:

1. Load the dataset that we want to use.

from art.utils import load_dataset

(x_train, y_train), (x_test, y_test), min_, max_ = load_dataset(str("mnist"))
x_train, y_train = x_train[:5000], y_train[:5000]
x_test, y_test = x_test[:1000], y_test[:1000]

2. Build the model that we want to attack. This is a simple CNN from the Keras examples. After we build our model is important to create the KerasClassifier from the ART library.

from keras.models import Model, Sequential
from keras.layers import Input, Dense, Flatten, Conv2D, MaxPooling2D, Dropout
from art.estimators.classification import KerasClassifier
import numpy as np
import tensorflow as tf


model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation="relu", input_shape=x_train.shape[1:]))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dense(128, activation="relu"))
model.add(Dense(10, activation="softmax"))

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

classifier = KerasClassifier(model=model, clip_values=(min_, max_))
classifier.fit(x_train, y_train, nb_epochs=5, batch_size=128)

3. Then, we need to generate attacks for the chosen model, in this case, we will use the Fast Gradient method, but there are several attacks available at ART.

from art.attacks.evasion.fast_gradient import FastGradientMethod

attacker = FastGradientMethod(classifier, eps=0.5)
x_train_adv = attacker.generate(x_train)
x_test_adv = attacker.generate(x_test)

4. Now, we’re ready to test our subset scanning detector. We need to provide to our subset scanning method, the model that we want to scan over (classifier), the background data, to extract the activations used to build expectation and which layer we want to scan over.

from art.defences.detector.evasion.subsetscanning import SubsetScanningDetector

detector = SubsetScanningDetector(classifier, x_train, layer=1)
clean_scores, adv_scores, dpwr = detector.scan(x_test, x_test_adv)

5. As a result the detector will return the detection power of our scanning method for a given set of test data and the scores assigned to clean and adversarial samples.

These scores can be plotted with seaborn and matplotlib. Looking at our plot we can see a clear separation of these distributions which shows a high detection power (in this case dpwr=0.999978).

import seaborn as sns
import matplotlib.pyplot as plt

sns.kdeplot(clean_scores, fill=True, label='clean images')
sns.kdeplot(adv_scores, fill=True, label='attacked images')
plt.title('Distribution of Subset Scores for layer 1')
plt.xlabel('Subset Score')

Detecting Adversarial Attacks with Subset Scanning


[0] Chen, J., Jordan, M.I. and Wainwright, M.J., 2020, May. Hopskipjumpattack: A query-efficient decision-based attack. In 2020 ieee symposium on security and privacy (sp) (pp. 1277-1294). IEEE.

[1] Cintas, C., Speakman, S., Akinwande, V., Ogallo, W., Weldemariam, K., Sridharan, S. and McFowland, E., 2020. Detecting Adversarial Attacks via Subset Scanning of Autoencoder Activations and Reconstruction Error . IJCAI 2020 – Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence Main track. Pages 876-882.

About the author/ODSC West speaker on detecting adversarial attacks:

Celia Cintas is a Research Scientist at IBM Research Africa – Nairobi. She is a member of the AI Science team at the Kenya Lab. Her current research explores subset scanning for anomalous pattern detection under generative models and the improvement of ML techniques to address challenges in Global Health. Previously, a grantee from the National Scientific and Technical Research Council at LCI-UNS and IPCSH-CONICET. She holds a Ph.D. in Computer Science from Universidad del Sur (Argentina). More info https://celiacintas.github.io/about/



ODSC gathers the attendees, presenters, and companies that are shaping the present and future of data science and AI. ODSC hosts one of the largest gatherings of professional data scientists with major conferences in USA, Europe, and Asia.