UniFormer-S (Kinetics-400) — acaua mirror (pure-PyTorch port)

Pure-PyTorch port of UniFormer-S (video classification, trained on Kinetics-400 with 16-frame clips at sampling stride 8) hosted under CondadosAI/ for use with the acaua computer vision library.

The architecture has been re-implemented in pure PyTorch under acaua.adapters.uniformer.video — no mmcv, no mmengine, no mmaction2, no trust_remote_code, no timm runtime dependency. The weights are converted from the upstream .pth checkpoint to safetensors with acaua's state-dict key naming (backbone.* + head.fc.*). They are not drop-in compatible with timm or Sense-X/UniFormer loaders — they are designed to load cleanly into acaua's nn.Module tree under load_state_dict(strict=True).

Provenance


Upstream code	`Sense-X/UniFormer` @ `main` (Apache-2.0)
Upstream weights	`Sense-X/uniformer_video` at revision `f9448914e6161573b14ba47b72fcef170e03a1f9` (MIT)
Upstream file	`uniformer_small_k400_16x8.pth`
Upstream SHA256	`d5fd7b0c49ee6a5422ef5d0c884d962c742003bfbd900747485eb99fa269d0db`
Upstream factory	`uniformer_small()` in `video_classification/models/uniformer.py`
Conversion script	`scripts/convert_uniformer_video.py`
Paper	Li et al., UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning, 2022
Params	22M
Top-1 (Kinetics-400, 16 frames x 1 clip x 1 crop)	78.4%
FLOPs	41.8G
Training recipe	16 input frames, sampling stride 8, 224x224 center-crop, ImageNet-mean/std normalization
Mirrored on	2026-04-24
Mirrored by	CondadosAI/acaua

Usage via acaua

import acaua

# MIT-declared weights require the explicit opt-in.
model = acaua.Model.from_pretrained(
    "CondadosAI/uniformer_s_k400", allow_non_apache=True
)
result = model.predict("video.mp4")
print(result.labels)   # tuple of top-5 Kinetics-400 action labels
print(result.scores)   # aligned float32 probabilities

Requires pip install 'acaua[video]' for the TorchCodec-backed video decoder and a system-level ffmpeg install.

Files in this mirror

model.safetensors — acaua-format weights (key-remapped, verified round-trip under load_state_dict(strict=True) at conversion time).
labels.json — JSON array of 400 Kinetics-400 action labels in index order. Read by the adapter at load time.
config.json — minimal metadata: acaua_task=video_classification, num_frames, num_classes.
NOTICE — attribution chain (code AND weights).
LICENSE — Apache-2.0.

License and attribution

The adapter code (this repository) is redistributed under Apache-2.0. The underlying weights carry upstream's MIT declaration (compatible, permissively redistributable). The acaua UniFormer-video adapter is itself a derivative work of the upstream PyTorch implementation — see NOTICE for the required attribution chain.

Citation

@misc{li2022uniformervideo,
  title        = {UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning},
  author       = {Li, Kunchang and Wang, Yali and Gao, Peng and Song, Guanglu and Liu, Yu and Li, Hongsheng and Qiao, Yu},
  year         = {2022},
  eprint       = {2201.04676},
  archivePrefix = {arXiv},
  primaryClass = {cs.CV},
}

Downloads last month: 54

Safetensors

Model size

21.4M params

Tensor type

F32

Inference Providers NEW

Video Classification

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including CondadosAI/uniformer_s_k400

acaua v0.1 weights

Collection

Apache-2.0 verified model weights for acaua v0.1. Mirrors pin upstream SHAs. • 17 items • Updated 9 days ago

Paper for CondadosAI/uniformer_s_k400

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning

Paper • 2201.04676 • Published Jan 12, 2022