UPerNet with FiRE-ViT Backbone for Semantic Segmentation

This model is a UPerNet semantic segmentation model with a FiRE-ViT (Vision Transformer with Rotary Position Embeddings) backbone, trained on the ADE20K dataset.

Model Description

Architecture: UPerNet
Backbone: FiRE-ViT Tiny
Dataset: ADE20K
Task: Semantic Segmentation
Framework: MMSegmentation

Training Results

Metric	Value
mIoU	24.38%
mAcc	33.57%
aAcc	71.33%

Usage

from mmseg.apis import init_model, inference_model

config_file = 'upernet_fire_vit_tiny_512x512_ade20k.py'
checkpoint_file = 'best_mIoU_iter_40000.pth'

# Initialize the model
model = init_model(config_file, checkpoint_file, device='cuda:0')

# Inference on an image
result = inference_model(model, 'demo.jpg')

Training Configuration

The model was trained with the following configuration:

Input size: 512x512
Training iterations: 40,000
Optimizer: AdamW
Learning rate scheduler: Polynomial decay

Citation

If you use this model, please cite:

@misc{rope-vit-segmentation,
  author = {VLG IITR},
  title = {UPerNet with FiRE-ViT for Semantic Segmentation},
  year = {2026},
  publisher = {Hugging Face},
}

License

This model is released under the Apache 2.0 license.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

aadex
/

upernet-fire-vit-tiny-512x512-ade20k