Code for the paper, "ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer".
Our experiments are conducted on a virtual machine hosted on a server with an AMD EPYC 7763 processor featuring a 64-core CPU. The virtual machine, running Ubuntu 22.04.3 LTS, is allocated access to one NVIDIA A100 GPU.
We use the following software packages:
Run pip3 install -r requirements.txt
to install the required packages.
The source code is structured into two main subfolders: target_models
and detection
. Within the target_models folder, you will find configurations for target ViT classifiers and adversarial attacks. The detection folder includes settings for the MAE model used in image reconstruction, along with configurations for ViTGuard detectors.
Note: Step 1 and Step 2 are optional, as the weights of the target model for the TinyImagenet dataset are available for download from this link. After downloading the file, move it to the target_models
directory and unzip it by running unzip results.zip && rm results.zip
. Additionally, the results
folder also contains the adversarial examples generated using the TinyImagenet dataset.
Note: Step 3(1) is optional, as the model weights for ViTMAE are available for download from this link. After downloading the file, move it to the detection
directory and unzip it by running unzip results.zip && rm results.zip
.
Note: Users can proceed directly to Step 3(2) to execute the detection process.
In the main directory, run cd target_models/run
A target ViT model can be trained by running
python3 train.py --dataset TinyImagenet
The model will be trained, saved into the target_models/results/ViT-16/TinyImagenet/
subfolder, and named
weights.pth
. The dataset used for training can be modified to CIFAR10
or CIFAR100
as needed.
Note: In the file target_models/WhiteBox.py
, the path defined in the 9th line should be modified to reflect the actual path of the repository on your system.
To craft adversarial samples, run
python3 attack.py --dataset TinyImagenet --attack PGD
The DataLoader holding the adversarial samples will be stored in the target_models/results/ViT-16/TinyImagenet/adv_results
subfolder.
In this example, the PGD attack is utilized; however, it can be substituted with other attack methods, including FGSM
, APGD
, CW
, SGM
, PatchFool
, AttentionFool
, SE
, and TR
. The dataset can be changed to CIFAR10
or CIFAR100
. The target_models/run/Table1.ipynb
shows the classification accuracy of adversarial examples generated by various attacks.
The detection mechanism comprises two stages: (1) training an MAE model for image reconstruction and (2) employing ViTGuard detectors.
In the main directory, run cd detection/run
(1) To train an MAE model, run
python3 train_mae.py --dataset TinyImagenet
The model will be trained, saved into the detection/results/TinyImagenet/
subfolder, and named weights.pth
.
(2) We proposed two individual detectors based on the attention and CLS representation, respectively. To get the AUC score for the detection method, run
python3 detection.py --dataset TinyImagenet --attack PGD --detector Attention
The detector can also be replaced with CLS
to evaluate the CLS-based detector. The PGD
attack can be substituted with other attack methods, including FGSM
, APGD
, CW
, SGM
, PatchFool
, AttentionFool
, SE
, and TR
.
(1) To replicate the results presented in Table 3, navigate to the detection/run
directory and execute the following command:
python3 table3.py --attack PGD --detector RL
The detector can be replaced with Attention
, CLS
, PD_T10
, and PD_T40
. The PGD
attack can be substituted with other attack methods, including CW
, PatchFool
, SGM
, and TR
.
(2) To replicate the results presented in Table 4, navigate to the detection/run
directory and execute the following command:
python3 table4.py --attack PGD --detector Attention --masking salient
The masking argument can also be set to non-salient
and random
. The detector can be replaced with CLS
. The PGD
attack can be substituted with other attack methods, including FGSM
, CW
, APGD
, SGM
, SE
, and TR
.
(3) To replicate the results presented in Table 5, begin by downloading the additional MAE model weights, trained with different masking ratios, from this link. Save the downloaded zip file in the detection/run
directory and unzip it. Then, under the detection/run
directory, execute the following command:
python3 table5.py --attack PGD --ratio 0.25 --detector Attention
The ratio argument can also be set to 0.5
and 0.75
. The detector can be replaced with CLS
. The PGD
attack can be substituted with other attack methods, including FGSM
, CW
, APGD
, SGM
, SE
, and TR
.
For reference, we present the training and inference times for the ViT-16 and MAE models on the Tiny-ImageNet dataset. It is important to note that the training time is reported per epoch, and the actual total training time vary depending on the number of epochs. In this study, we employed 50 epochs for fine-tuning ViT-16 (Step 1) and 500 epochs for training MAE (Step 3(1)). Additionally, the inference time is measured per individual sample.
Model | Training | Inference |
---|---|---|
ViT-16 | 255 s | 0.2 ms |
MAE | 660 s | 2.4 ms |