Abstract
Arcee Trinity models are sparse Mixture-of-Experts architectures with varying parameter counts and activation patterns, utilizing advanced attention mechanisms and training optimizations.
We present the technical report for Arcee Trinity Large, a sparse Mixture-of-Experts model with 400B total parameters and 13B activated per token. Additionally, we report on Trinity Nano and Trinity Mini, with Trinity Nano having 6B total parameters with 1B activated per token, Trinity Mini having 26B total parameters with 3B activated per token. The models' modern architecture includes interleaved local and global attention, gated attention, depth-scaled sandwich norm, and sigmoid routing for Mixture-of-Experts. For Trinity Large, we also introduce a new MoE load balancing strategy titled Soft-clamped Momentum Expert Bias Updates (SMEBU). We train the models using the Muon optimizer. All three models completed training with zero loss spikes. Trinity Nano and Trinity Mini were pre-trained on 10 trillion tokens, and Trinity Large was pre-trained on 17 trillion tokens. The model checkpoints are available at https://huggingface.co/arcee-ai.
Community
arXivLens breakdown of this paper ๐ https://arxivlens.com/PaperView/Details/arcee-trinity-large-technical-report-2819-97270046
- Executive Summary
- Detailed Breakdown
- Practical Applications
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- A.X K1 Technical Report (2026)
- IMU-1: Sample-Efficient Pre-training of Small Language Models (2026)
- K-EXAONE Technical Report (2026)
- MiMo-V2-Flash Technical Report (2026)
- Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm (2026)
- ERNIE 5.0 Technical Report (2026)
- MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 104
Collections including this paper 0
No Collection including this paper