This is the official repository for our recent work: RoboFireFuseNet: Robust Fusion of Visible and Infrared WildfireImaging for Real-Time Flame and Smoke Segmentation
Concurrent segmentation of flames and smoke is challenging, as smoke frequently obscures fire in RGB imagery. Existing multimodal models are either too computationally demanding for real-time deployment or too lightweight to capture fine fire details that may escalate into large wildfires. Moreover, they are typically trained and validated on simplistic datasets, such as Corsican and FLAME1, which lack the dense smoke occlusion present in real-world scenarios. We introduce RoboFireFuseNet (RFFNet), a real-time deep neural network that fuses RGB and infrared (IR) data through attention-based mechanisms and a detail-preserving decoder. Beyond strong performance, RFFNet establishes a benchmark on a challenging, real-world wildfire dataset with dense smoke, creating a foundation for fair comparison in future flame-and-smoke segmentation research. Despite its lightweight design, it achieves state-of-the-art results on a general urban benchmark, demonstrating both efficiency and versatility. Its combination of accuracy, real-time performance, and multimodal fusion makes RFFNet well-suited for proactive, robust and accurate wildfire monitoring.
 
Schematic overview of the proposed fusion model and
Our model enhances PIDNet-Small by integrating SwinV2-T Transformer blocks, improving capacity and capturing long-range dependencies. To better preserve and extract modality-specific features, we introduce dedicated modality pathways. Additionally, we replace basic upscaling with a U-Net-style decoder, enhancing spatial reconstruction and producing high-resolution segmentation maps.
 
| Method | Avg Recall (%) | MIoU (%) | Params (M) | 
|---|---|---|---|
| PIDNet-RGB | 75.66 | 61.21 | 34.4 | 
| PIDNet-IR | 83.05 | 58.71 | 34.4 | 
| PIDNet-Early | 88.25 | 73.90 | 34.4 | 
| MFNet | 93.53 | 80.26 | 0.73 | 
| EFSIT* | 90.15 | 80.09 | 4.8 | 
| RTFNet | 73.87 | 65.42 | 185.24 | 
| GMNet | 67.53 | 54.08 | 153 | 
| EGFNet | 74.27 | 60.98 | 62.5 | 
| CRM-T | - | - | 59.1 | 
| Sigma-T | 92.6 | 86.27 | 48.3 | 
| Ours | 94.37 | 88.17 | 29.5 | 
 
| Method | Avg Recall (%) | MIoU (%) | Params (M) | 
|---|---|---|---|
| PIDNet-m RGB | 65.59 | 51.52 | 34.4 | 
| PIDNet-m IR | 65.27 | 50.70 | 34.4 | 
| PIDNet-m Early | 69.59 | 52.62 | 34.4 | 
| MFNet | 59.1 | 39.7 | 0.73 | 
| RTFNet | 63.08 | 53.2 | 185.24 | 
| GMNet | 74.1 | 57.3 | 153 | 
| EGFNet | 72.7 | 54.8 | 62.5 | 
| CRM-T | - | 59.7 | 59.1 | 
| Sigma-T | 71.3 | 60.23 | 48.3 | 
| Ours | 71.1 | 60.6 | 29.5 | 
 
pip install -r requirements.txt or recommender Python 3.10.12Customize configurations via the config/ folder or override them with inline arguments.
python train.py --yaml_file wildfire.yaml --LR 0.001 --BATCHSIZE 5 --WD 0.00005 --SESSIONAME "train_simple" --EPOCHS 500 --DEVICE "cuda:0" --STOPCOUNTER 30 --ONLINELOG False --PRETRAINED "weights/pretrained_480x640_w8_2_6.pth" --OPTIM "ADAM" --SCHED "COS"python train.py --yaml_file urban.yaml --LR 0.001 --BATCHSIZE 5 --WD 0.00001 --SESSIONAME "train_simple" --EPOCHS 500 --DEVICE "cuda:0" --STOPCOUNTER 30 --ONLINELOG False --PRETRAINED "weights/pretrained_480x640_w8_2_6.pth" --OPTIM "ADAM" --SCHED "COS"Customize configurations via the config/ folder or override them with inline arguments.
python test.py --yaml_file wildfire.yaml --SESSIONAME "train_simple" --DEVICE "cuda:0" --PRETRAINED "weights/robo_fire_best.pth"python test.py --yaml_file urban.yaml --SESSIONAME "train_simple" --DEVICE "cuda:0" --PRETRAINED "weights/robo_urban.pth"<prefix>_rgb_<postfix>.png for RGB images, <prefix>_ir_<postfix>.png for IR images, and a .txt file with rows formatted as <prefix>_XXX_<postfix>.png. Replace <prefix> and <postfix> with any values, ensuring rgb and ir indicate the modality. Optionally, include ground truth files named <prefix>_gt_<postfix>.png to calculate metrics. Run the demo using python test.py --yaml_file wildfire_demo.yaml for the wildfire demo or python test.py --yaml_file urban_demo.yaml for the urban demo for urban one. If you use custom .txt file instead of demo_fire.txt and demo_urban.txt adjust the YAML config files.(TODO: complete the information when paper accepted)
[1] : PIDnet: Xu, Jiacong et al. βPIDNet: A Real-time Semantic Segmentation Network Inspired by PID Controllers.β 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022): 19529-19539. 
[2] : FLAME2: Bryce Hopkins, Leo O'Neill, Fatemeh Afghah, Abolfazl Razi, Eric Rowell, Adam Watts, Peter Fule, Janice Coen, August 29, 2022, "FLAME 2: Fire detection and modeLing: Aerial Multi-spectral imagE dataset", IEEE Dataport, doi: https://dx.doi.org/10.21227/swyw-6j78.