Search-Based Robustness Testing for traffic sign classifiers

This project is not to be copied, published, or used in any other form by others than the authors!

This project explores a search-based methodology for robustness testing of Deep Learning traffic sign classifiers against so-called sticker attacks.

source

These attacks manipulate traffic signs in order to trick DNN's into false predictions and therefore can provoke hazardous system failures in autonomous vehicles. While other types of attacks, such as adversarial attacks, are conducted in a fully observable and highly influenced environment where individual pixels are manipulated in very specific ways through backpropagation, sticker attacks describe a case that is more likely to actually be encountered in the real world. The search-based approach of this project can treat the classifiers as a black box where only the inputs and outputs of the model are relevant, which makes it a dangerous but also highly relevant for the hardening of classification models.

Assuming that a more robust model can widthstand larger stickers, we explore different search algorithms (brute-force, genetic, etc.) to find the minimal stickers that cause misclassifications. We compare the performance of the different search algorithms and fitness functions that guide the algorithms and provide statistical analysis.

This project makes use of two major optimization frameworks for Python (JMetalPy and Pymoo) and uses PyTorch for data handling and DNNs.

Currently only one classifier (link to Github-repo) is analyzed but further models can be added easily. This model is trained on the German Traffic Sign Recognition Benchmark - GTSRB as it is one of the most popular datasets for this classification problem.

The Pymoo and JMetalPy algorithms can optimize for up to N colored and rectangular sicker (please note that the multi sticker variants are not fully configured and thus perform often worse than the single sticker variant). The optimal solution can only be computed for 1 sticker per attack as the search is too computationally expensive for multiple stickers.

This project was developed as part of the Advanced Python Programming Lab PyPracticum in the summer semester 2022 at the Technical University of Munich (link to initial project problem statment).

It is also intended to form the basis for further research on an automated method to test the robustness of traffic sign classifiers, and to ultimately harden them.

Getting started

Installing dependencies

The dependencies are managed using poetry

If you have not yet installed poetry

$ pip install poetry

Change directory to the parent directory of this project and install the dependencies

$ poetry install

Done! You can now run the python scripts using

$ poetry run python _____.py

Running an Example Strategy, let's make some data!

We included some example implemented strategies in rt_search_based/examples/ you can run these and see how they run on your system using the cli.

Running the pymoo example:

$ poetry run python main --example pymoo

Available examples:

bruteforce
- rather slow as it is an exhausitve search (can take a long time for large minimal stickers as millions of images need to be generated and evaluated by the model)
- can work in batch mode when a GPU Cuda is available
pymoo
jmetalpy

Starting the GUI

The results stored in the database can be viewed via the GUI webapp which can be launched using the cli

$ poetry run python main.py --gui

The gui can then be accessed on a web browser via the localhost on port 5000 or simply http://127.0.0.1:5000

Screenshot

Database Persistence

The database stores the best results found by the different strategies over time, accumulating the best results a strategy finds over many different runs over time. The updated csv files with the new optimal solutions should then be committed to the repo.

The database is current initialized for the whole GTSRB dataset. If a solution was not yet found for a specific image, it stores a -1. Note: The database operates on csv files and torch tensors and was implemented by us to keep performance high.

Authors and acknowledgment

Contributors:

Felix Elfering (felix.elfering+pyp@tum.de)
Johannes Volk (johannes.volk@tum.de)
Dejvi Zelo (dejvi.zelo@tum.de)

Detailed contributions can be found in CONTRIBUTIONS.md

Advisor:

Simon Speth (simon.speth@tum.de)

Project status

While the PyPracticum SS 2022 has come to an end, there are still many things that can be improved and added to this project that were out of scope for our 5 week developement time frame.

Further Impressions

Video demo showing a full search run with a genetic algorithm and a fitness function that considers the degree to which the classifier is fooled.

Short summary of the following: The highlighted pixels form a classifiers least robust areas on the respective traffic sign. This helps to better understand how the classifiers work and where their weaknesses lie.

Screenshot