UniFormer-S (Kinetics-400) β€” acaua mirror (pure-PyTorch port)

Pure-PyTorch port of UniFormer-S (video classification, trained on Kinetics-400 with 16-frame clips at sampling stride 8) hosted under CondadosAI/ for use with the acaua computer vision library.

The architecture has been re-implemented in pure PyTorch under acaua.adapters.uniformer.video β€” no mmcv, no mmengine, no mmaction2, no trust_remote_code, no timm runtime dependency. The weights are converted from the upstream .pth checkpoint to safetensors with acaua's state-dict key naming (backbone.* + head.fc.*). They are not drop-in compatible with timm or Sense-X/UniFormer loaders β€” they are designed to load cleanly into acaua's nn.Module tree under load_state_dict(strict=True).

Provenance

Upstream code Sense-X/UniFormer @ main (Apache-2.0)
Upstream weights Sense-X/uniformer_video at revision f9448914e6161573b14ba47b72fcef170e03a1f9 (MIT)
Upstream file uniformer_small_k400_16x8.pth
Upstream SHA256 d5fd7b0c49ee6a5422ef5d0c884d962c742003bfbd900747485eb99fa269d0db
Upstream factory uniformer_small() in video_classification/models/uniformer.py
Conversion script scripts/convert_uniformer_video.py
Paper Li et al., UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning, 2022
Params 22M
Top-1 (Kinetics-400, 16 frames x 1 clip x 1 crop) 78.4%
FLOPs 41.8G
Training recipe 16 input frames, sampling stride 8, 224x224 center-crop, ImageNet-mean/std normalization
Mirrored on 2026-04-24
Mirrored by CondadosAI/acaua

Usage via acaua

import acaua

# MIT-declared weights require the explicit opt-in.
model = acaua.Model.from_pretrained(
    "CondadosAI/uniformer_s_k400", allow_non_apache=True
)
result = model.predict("video.mp4")
print(result.labels)   # tuple of top-5 Kinetics-400 action labels
print(result.scores)   # aligned float32 probabilities

Requires pip install 'acaua[video]' for the TorchCodec-backed video decoder and a system-level ffmpeg install.

Files in this mirror

  • model.safetensors β€” acaua-format weights (key-remapped, verified round-trip under load_state_dict(strict=True) at conversion time).
  • labels.json β€” JSON array of 400 Kinetics-400 action labels in index order. Read by the adapter at load time.
  • config.json β€” minimal metadata: acaua_task=video_classification, num_frames, num_classes.
  • NOTICE β€” attribution chain (code AND weights).
  • LICENSE β€” Apache-2.0.

License and attribution

The adapter code (this repository) is redistributed under Apache-2.0. The underlying weights carry upstream's MIT declaration (compatible, permissively redistributable). The acaua UniFormer-video adapter is itself a derivative work of the upstream PyTorch implementation β€” see NOTICE for the required attribution chain.

Citation

@misc{li2022uniformervideo,
  title        = {UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning},
  author       = {Li, Kunchang and Wang, Yali and Gao, Peng and Song, Guanglu and Liu, Yu and Li, Hongsheng and Qiao, Yu},
  year         = {2022},
  eprint       = {2201.04676},
  archivePrefix = {arXiv},
  primaryClass = {cs.CV},
}
Downloads last month
54
Safetensors
Model size
21.4M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including CondadosAI/uniformer_s_k400

Paper for CondadosAI/uniformer_s_k400