Adversarial attacks

Edoardo Gruppi
Apr 28, 2021
3 min read

Updated: May 30, 2021

Adversarial attacks can affect deep learning method functioning leading to dramatic results in safety-critical scenarios.

Self-driving cars are controlled by cooperating deep learning systems that need to be highly accurate to safeguard the safety of the passengers as well as of the other entities on the road. Unfortunately, several experiments showcased that, for instance, state-of-the-art deep learning image classification neural networks are susceptible to adversarial attacks. Indeed, small carefully crafted perturbations added to the input could significantly mislead the employed deep learning methods.

In safety-critical scenarios such as driverless systems, the consequences can be dramatic causing potentially dangerous situations.

Digital and Physical Adversarial Attacks

Adversarial attacks differ depending on whether they are carried out in the digital domain or the physical domain. In the former, the addition of a precisely selected and not human-perceptible noise layer to an input image may lead the model to misclassifying outcomes. Nevertheless, since the perturbation should occur by intercepting and modifying on-fly the data transmitted by the sensor before being received by the deep learning model, such typology of attacks is usually difficult and rare.

Figure made for illustrative purposes. For more details on the study from which it is inspired, refer to the paper [4].

Conversely, the direct alteration of the physical environment is more feasible. Road elements including road markings and traffic signs can be easily perturbed confusing deep learning systems while remaining still understandable to humans.

In the paper presented by Eykholt et al., traffic sign classification systems were successfully misled through the physical application on signs of black and white stickers that resemble vandalism forms such as graffiti. For example, the authors caused a classifier to incorrectly interpret a stop sign as a speed limit 45 sign.

Although physical attacks are easier to carry out than digital, they are required to overcome a number of challenges. First of all, their effectiveness must remain the same even considering different distances and angles with respect to the camera that observes the environment. Furthermore, unlike digital attacks where the perturbation is also difficult to identify, in this case, it must be large enough to be perceived by the camera.

However, the authors of the aforementioned study effectively demonstrated that it is possible to generate real physical adversarial attacks robust to different angles and distances.

Adversarial Attack Typologies

Attacks can generally be classified into two categories, namely:

black-box attacks where the malicious entity does not have access to any information regarding the model;
white-box attacks where the attacker knows the parameters and gradients of the victim network.

Additionally, an attack can also be differentiated according to whether the perturbation is performed to address the model on a certain result (targeted attack) or simply on a result different from the real one (untargeted attack).

The Impact on Mentioned Cases

The execution of adversarial attacks that modify the understanding that an autonomous vehicle has of the surrounding environment impacts significantly on the perception task and thus on the behavioural planning process. For instance, within the computer vision tracking context, a system can be deceived on the type of object identified and therefore on its expected behaviour. Implementing additional sensors, such as in the LaserNet++ discussed in one of the blog articles, can in some cases lead to correct the errors made in a specific domain through redundancy. Mentioning the 2018 study of the London company Wayve, since the entire driverless system relies on a single monocular camera, it is certainly susceptible to possible physical alterations of traffic road signs and of horizontal markings including lane lines.

In conclusion, it is reasonable that in the future the researchers will mainly focus on dealing with adversarial attacks performed in the physical environment. Since it is unfeasible to remove any disturbance that afflicts traffic signs, the goal is to create models that are as robust as possible to this type of noise.

References

Eykholt, Kevin, et al. "Robust physical-world attacks on deep learning visual classification." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
Chakraborty, Arunava. "Introduction to Adversarial Machine Learning." FloydHub Blog, 6 Nov. 2019, blog.floydhub.com/introduction-to-adversarial-machine-learning.
Early, Joseph. "Your Car May Not Know When to Stop — Adversarial Attacks Against Autonomous Vehicles." Medium, 20 Sept. 2019, towardsdatascience.com/your-car-may-not-know-when-to-stop-adversarial-attacks-against-autonomous-vehicles-a16df91511f4.
Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).

The images in the blog are either copyright free or designed from scratch.