ViTGuard (under Artifact Evaluation from ACSAC 2024)

Code for the paper, "ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer".

Experimental Environment

Hardware

Our experiments are conducted on a virtual machine hosted on a server with an AMD EPYC 7763 processor featuring a 64-core CPU. The virtual machine, running Ubuntu 22.04.3 LTS, is allocated access to one NVIDIA A100 GPU.

Software Installation

We use the following software packages:

python==3.10.12
pytorch==2.0.1
torchvision==0.15.2
numpy==1.26.0
transformers==4.33.3

Run pip3 install -r requirements.txt to install the required packages.

Code Structure

The source code is structured into two main subfolders: target_models and detection. Within the target_models folder, you will find configurations for target ViT classifiers and adversarial attacks. The detection folder includes settings for the MAE model used in image reconstruction, along with configurations for ViTGuard detectors.

Running the Code

Note: Step 1 and Step 2 are optional, as the weights of the target model for the TinyImagenet dataset are available for download from this link. After downloading the file, move it to the target_models directory and unzip it by running unzip results.zip && rm results.zip. Additionally, the results folder also contains the adversarial examples generated using the TinyImagenet dataset.

Note: Step 3(1) is optional, as the model weights for ViTMAE are available for download from this link. After downloading the file, move it to the detection directory and unzip it by running unzip results.zip && rm results.zip.

Note: Users can proceed directly to Step 3(2) to execute the detection process.

Step 1. Train a target model

In the main directory, run cd target_models/run

A target ViT model can be trained by running

python3 train.py --dataset TinyImagenet

The model will be trained, saved into the target_models/results/ViT-16/TinyImagenet/ subfolder, and named weights.pth. The dataset used for training can be modified to CIFAR10 or CIFAR100 as needed.

Step 2. Craft adversarial samples

Note: In the file target_models/WhiteBox.py, the path defined in the 9th line should be modified to reflect the actual path of the repository on your system.

To craft adversarial samples, run

python3 attack.py --dataset TinyImagenet --attack PGD

The DataLoader holding the adversarial samples will be stored in the target_models/results/ViT-16/TinyImagenet/adv_results subfolder.

In this example, the PGD attack is utilized; however, it can be substituted with other attack methods, including FGSM, APGD, CW, SGM, PatchFool, AttentionFool, SE, and TR. The dataset can be changed to CIFAR10 or CIFAR100. The target_models/run/Table1.ipynb shows the classification accuracy of adversarial examples generated by various attacks.

Step 3. Detect adversarial samples

The detection mechanism comprises two stages: (1) training an MAE model for image reconstruction and (2) employing ViTGuard detectors.

In the main directory, run cd detection/run

(1) To train an MAE model, run

python3 train_mae.py --dataset TinyImagenet

The model will be trained, saved into the detection/results/TinyImagenet/ subfolder, and named weights.pth.

(2) We proposed two individual detectors based on the attention and CLS representation, respectively. To get the AUC score for the detection method, run

python3 detection.py --dataset TinyImagenet --attack PGD --detector Attention

The detector can also be replaced with CLS to evaluate the CLS-based detector. The PGD attack can be substituted with other attack methods, including FGSM, APGD, CW, SGM, PatchFool, AttentionFool, SE, and TR.

Step 4. Ablation Studies

(1) To replicate the results presented in Table 3, navigate to the detection/run directory and execute the following command:

python3 table3.py --attack PGD --detector RL

The detector can be replaced with Attention, CLS, PD_T10, and PD_T40. The PGD attack can be substituted with other attack methods, including CW, PatchFool, SGM, and TR.

(2) To replicate the results presented in Table 4, navigate to the detection/run directory and execute the following command:

python3 table4.py --attack PGD --detector Attention --masking salient

The masking argument can also be set to non-salient and random. The detector can be replaced with CLS. The PGD attack can be substituted with other attack methods, including FGSM, CW, APGD, SGM, SE, and TR.

(3) To replicate the results presented in Table 5, begin by downloading the additional MAE model weights, trained with different masking ratios, from this link. Save the downloaded zip file in the detection/run directory and unzip it. Then, under the detection/run directory, execute the following command:

python3 table5.py --attack PGD --ratio 0.25 --detector Attention

The ratio argument can also be set to 0.5 and 0.75. The detector can be replaced with CLS. The PGD attack can be substituted with other attack methods, including FGSM, CW, APGD, SGM, SE, and TR.

Running Time

For reference, we present the training and inference times for the ViT-16 and MAE models on the Tiny-ImageNet dataset. It is important to note that the training time is reported per epoch, and the actual total training time vary depending on the number of epochs. In this study, we employed 50 epochs for fine-tuning ViT-16 (Step 1) and 500 epochs for training MAE (Step 3(1)). Additionally, the inference time is measured per individual sample.

Model	Training	Inference
ViT-16	255 s	0.2 ms
MAE	660 s	2.4 ms