

Adversarial Attacks on Deep Neural Networks
ConferencesModelingConferencesNeural Networksodsc eastposted by Elizabeth Wallace, ODSC June 11, 2019 Elizabeth Wallace, ODSC

Our deep neural networks are powerful machines, but what we don’t understand can hurt us. As sophisticated as they are, they’re highly vulnerable to small attacks that can radically change their outputs. As we go deeper into the capabilities of our networks, we must examine how these networks really work to build more robust security.
At ODSC East 2019, Sihem Romdhani of Veeva Systems outlined how these networks are still highly vulnerable despite their power and how it’s precisely their mysterious operations that makes it so challenging to build safer networks. We can’t continue to rush towards bigger, deeper models without sufficient security or we will pay the price.
What is an Adversarial Attack?
Humans are great at filtering out noise and perturbations. However, deep neural networks are extremely literal, and it still takes very little noise to fool a trained network. While we may agree that two pictures are indeed pigs, a small amount of noise imperceptible to our eye could cause the computer to believe that one is a pig and one is an airliner.
The most common type of neural network is a convolutional neural network. It uses connected layers to create a classification through training. We can manipulate images by using our knowledge of the training model and the purpose of the attack. A targeted attack, for example, manipulates the input images to change the classifier. The input can be used to cause the machine to see what the attacker wants. In some cases, it’s possible to accomplish this by changing only one pixel.
These attacks aren’t noise. Noise is random or uncontrolled interference. Hackers can control perturbations so that they aren’t detectable to standard noise filters. This is what makes these attacks so dangerous.
[Related article: The Importance of Explainable AI]
3D Images Are Also Vulnerable
Some claim that this isn’t a problem with 3D object recognition, but can we really secure the model? No, unfortunately. A recent experiment showed that even a 3D printed turtle could be predicted as a rifle by changing only a few things.
Most of the attacks had access to the model parameters, so could hiding those make these models safer? Again, no.
Black box attacks can fool deep neural networks even when the parameters of the training data are hidden. By observing the output, you can still fool those deep neural networks. Even more worrying, adversarial attacks can often transfer across models as long as those models are trained for the same action.
The substitute model technique is one example of this. You can generate your input data and query your black box model and train your substitute model. Once you’ve trained the substitute model, you can generate adversarial attacks by adding small perturbation inputs because the substitute model approximates the decision boundary of your unknown target black box model.
Other Adversarial Attacks Beyond Image Data
Voice and text data is more difficult, but not impossible. In speech recognition, for example, you can add a small noise that approximates a perturbation in image data. It causes a wrong prediction or a prediction specifically desired by the attacker. If you have a device like Alexa, an attacker could possibly have full access to your system by merely listening to music at your home.
Natural language understanding is also at risk. In sentiment analysis, it’s possible to change a few characters, sometimes even one character, and flip the sentiment from positive to negative. You could launch an attack on a business or organization simply through altering the analysis of posts.
[Related article: Innovators and Regulators Collaborate on Book Tackling AI’s Black Box Problem]
Making More Robust Models
So how do we begin to make our models more robust in response to these adversarial attacks? Here are some of the steps Romdhani suggests taking to harness the full power of deep neural networks while making them more secure.
Modifying training of the model
Adversarial training has promising results for making a more robust model. At each training iteration, you use the state of your model to generate an adversarial example with the original input to alter the model training. It increases the robustness and reduces overfitting.
Modifying the network
Network distillation could also help make the network more robust. It’s also for transferring knowledge from complicated networks to smaller networks like what’s deployed in smartphones, for example. You use the knowledge of the model itself to increase security. First, train your model and the label is represented as a hot vector. Transfer the probability to increase the robustness. Use the same input but with generated probabilities.
Adding on Networks
Perturbation Rectifying Networks apply real and synthetic image perturbations to redefine how the networks themselves are trained. The query image passes through the system and the detector verifies it. If there’s a perturbation, use the output of the PRN instead of the actual image for label recognition. Then, append the extra layers to the input. We use this method mainly for detecting universal perturbations.
Moving Forward With Security
We need more open source platforms to help evaluate the networks specifically against adversarial attacks. We also need more data to test networks against perturbation types. Finally, we need to understand how these networks work. As long as they remain a black box, it can be difficult for us to identify the weakness. If we dive more deeply into how they work, we may be able to unlock how these attacks can unravel the training to alter decisions.
These are powerful networks, but we must focus our efforts onto security and reliability instead of answering questions. It’s tempting to continue to use the magic of deep neural networks to unlock our desires for bigger, better AI, but spending time understanding our creations could help us build more secure systems.