# ViTGuard (under Artifact Evaluation from ACSAC 2024) Code for the paper, "ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer". ## Experimental Environment ### Hardware Our experiments are conducted on a virtual machine hosted on a server with an AMD EPYC 7763 processor featuring a 64-core CPU. The virtual machine, running Ubuntu 22.04.3 LTS, is allocated access to one NVIDIA A100 GPU. ### Software Installation We use the following software packages: Run `pip3 install -r requirements.txt` to install the required packages. ## Code Structure The source code is structured into two main subfolders: `target_models` and `detection`. Within the target_models folder, you will find configurations for target ViT classifiers and adversarial attacks. The detection folder includes settings for the MAE model used in image reconstruction, along with configurations for ViTGuard detectors. ## Running the Code **Note:** Step 1 and Step 2 are optional, as the weights of the target model for the TinyImagenet dataset are available for download from [this link](https://drive.google.com/file/d/14wTa5UngTWcNN4nymAsyjvrZSv9pjDVN/view?usp=drive_link). After downloading the file, move it to the `target_models` directory and unzip it by running `unzip results.zip && rm results.zip`. Additionally, the `results` folder also contains the adversarial examples generated using the TinyImagenet dataset. **Note:** Step 3(1) is optional, as the model weights for ViTMAE are available for download from [this link](https://drive.google.com/file/d/13KE103qawMLhIeBE8-U4kr9GoagkOwCs/view?usp=sharing). After downloading the file, move it to the `detection` directory and unzip it by running `unzip results.zip && rm results.zip`. **Note:** Users can proceed directly to Step 3(2) to execute the detection process. ### Step 1. Train a target model In the main directory, run `cd target_models/run` A target ViT model can be trained by running python3 train.py --dataset TinyImagenet The model will be trained, saved into the `target_models/results/ViT-16/TinyImagenet/` subfolder, and named `weights.pth`. The dataset used for training can be modified to `CIFAR10` or `CIFAR100` as needed. ### Step 2. Craft adversarial samples **Note:** In the file `target_models/WhiteBox.py`, the path defined in the 9th line should be modified to reflect the actual path of the repository on your system. To craft adversarial samples, run python3 attack.py --dataset TinyImagenet --attack PGD The DataLoader holding the adversarial samples will be stored in the `target_models/results/ViT-16/TinyImagenet/adv_results` subfolder. In this example, the PGD attack is utilized; however, it can be substituted with other attack methods, including `FGSM`, `APGD`, `CW`, `SGM`, `PatchFool`, `AttentionFool`, `SE`, and `TR`. The dataset can be changed to `CIFAR10` or `CIFAR100`. The `target_models/run/Table1.ipynb` shows the classification accuracy of adversarial examples generated by various attacks. ### Step 3. Detect adversarial samples The detection mechanism comprises two stages: (1) training an MAE model for image reconstruction and (2) employing ViTGuard detectors. In the main directory, run `cd detection/run` (1) To train an MAE model, run python3 train_mae.py --dataset TinyImagenet The model will be trained, saved into the `detection/results/TinyImagenet/` subfolder, and named `weights.pth`. (2) We proposed two individual detectors based on the attention and CLS representation, respectively. To get the AUC score for the detection method, run python3 detection.py --dataset TinyImagenet --attack PGD --detector Attention The detector can also be replaced with `CLS` to evaluate the CLS-based detector. The `PGD` attack can be substituted with other attack methods, including `FGSM`, `APGD`, `CW`, `SGM`, `PatchFool`, `AttentionFool`, `SE`, and `TR`. ### Step 4. Ablation Studies (1) To replicate the results presented in Table 3, navigate to the `detection/run` directory and execute the following command: python3 table3.py --attack PGD --detector RL The detector can be replaced with `Attention`, `CLS`, `PD_T10`, and `PD_T40`. The `PGD` attack can be substituted with other attack methods, including `CW`, `PatchFool`, `SGM`, and `TR`. (2) To replicate the results presented in Table 4, navigate to the `detection/run` directory and execute the following command: python3 table4.py --attack PGD --detector Attention --masking salient The masking argument can also be set to `non-salient` and `random`. The detector can be replaced with `CLS`. The `PGD` attack can be substituted with other attack methods, including `FGSM`, `CW`, `APGD`, `SGM`, `SE`, and `TR`. (3) To replicate the results presented in Table 5, begin by downloading the additional MAE model weights, trained with different masking ratios, from [this link](https://drive.google.com/file/d/1rXWM7qQu6fVNUHjcqn_jz6Znte8JLY7u/view?usp=sharing). Save the downloaded zip file in the `detection/run` directory and unzip it. Then, under the `detection/run` directory, execute the following command: python3 table5.py --attack PGD --ratio 0.25 --detector Attention The ratio argument can also be set to `0.5` and `0.75`. The detector can be replaced with `CLS`. The `PGD` attack can be substituted with other attack methods, including `FGSM`, `CW`, `APGD`, `SGM`, `SE`, and `TR`. ## Running Time For reference, we present the training and inference times for the ViT-16 and MAE models on the Tiny-ImageNet dataset. It is important to note that the training time is reported per epoch, and the actual total training time vary depending on the number of epochs. In this study, we employed 50 epochs for fine-tuning ViT-16 (Step 1) and 500 epochs for training MAE (Step 3(1)). Additionally, the inference time is measured per individual sample. | Model | Training | Inference | | :---: | :---: | :---: | | ViT-16 | 255 s | 0.2 ms | | MAE | 660 s | 2.4 ms |