UniFormer-S (Kinetics-400) β acaua mirror (pure-PyTorch port)
Pure-PyTorch port of UniFormer-S (video classification, trained on
Kinetics-400 with 16-frame clips at sampling stride 8) hosted under
CondadosAI/ for use with the acaua
computer vision library.
The architecture has been re-implemented in pure PyTorch under
acaua.adapters.uniformer.video β no mmcv, no mmengine, no
mmaction2, no trust_remote_code, no timm runtime dependency.
The weights are converted from the upstream .pth checkpoint to
safetensors with acaua's state-dict key naming (backbone.* +
head.fc.*). They are not drop-in compatible with timm or
Sense-X/UniFormer loaders β they are designed to load cleanly into
acaua's nn.Module tree under load_state_dict(strict=True).
Provenance
| Upstream code | Sense-X/UniFormer @ main (Apache-2.0) |
| Upstream weights | Sense-X/uniformer_video at revision f9448914e6161573b14ba47b72fcef170e03a1f9 (MIT) |
| Upstream file | uniformer_small_k400_16x8.pth |
| Upstream SHA256 | d5fd7b0c49ee6a5422ef5d0c884d962c742003bfbd900747485eb99fa269d0db |
| Upstream factory | uniformer_small() in video_classification/models/uniformer.py |
| Conversion script | scripts/convert_uniformer_video.py |
| Paper | Li et al., UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning, 2022 |
| Params | 22M |
| Top-1 (Kinetics-400, 16 frames x 1 clip x 1 crop) | 78.4% |
| FLOPs | 41.8G |
| Training recipe | 16 input frames, sampling stride 8, 224x224 center-crop, ImageNet-mean/std normalization |
| Mirrored on | 2026-04-24 |
| Mirrored by | CondadosAI/acaua |
Usage via acaua
import acaua
# MIT-declared weights require the explicit opt-in.
model = acaua.Model.from_pretrained(
"CondadosAI/uniformer_s_k400", allow_non_apache=True
)
result = model.predict("video.mp4")
print(result.labels) # tuple of top-5 Kinetics-400 action labels
print(result.scores) # aligned float32 probabilities
Requires pip install 'acaua[video]' for the TorchCodec-backed video
decoder and a system-level ffmpeg install.
Files in this mirror
model.safetensorsβ acaua-format weights (key-remapped, verified round-trip underload_state_dict(strict=True)at conversion time).labels.jsonβ JSON array of 400 Kinetics-400 action labels in index order. Read by the adapter at load time.config.jsonβ minimal metadata:acaua_task=video_classification,num_frames,num_classes.NOTICEβ attribution chain (code AND weights).LICENSEβ Apache-2.0.
License and attribution
The adapter code (this repository) is redistributed under Apache-2.0.
The underlying weights carry upstream's MIT declaration (compatible,
permissively redistributable). The acaua UniFormer-video adapter is
itself a derivative work of the upstream PyTorch implementation β see
NOTICE for the required attribution chain.
Citation
@misc{li2022uniformervideo,
title = {UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning},
author = {Li, Kunchang and Wang, Yali and Gao, Peng and Song, Guanglu and Liu, Yu and Li, Hongsheng and Qiao, Yu},
year = {2022},
eprint = {2201.04676},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
}
- Downloads last month
- 54