Gamayun's Path to Multilingual Mastery: Cost-Efficient Training of a 1.5B-Parameter LLM
Abstract
We present Gamayun, a 1.5B-parameter multilingual language model trained entirely from scratch on 2.5T tokens. Designed for efficiency and deployment in resource-constrained environments, Gamayun addresses the lack of research on small non-English-centric LLMs by adopting a novel two-stage pre-training strategy: balanced multilingual training for cross-lingual alignment, followed by high-quality English enrichment to transfer performance gains across languages. Our model supports 12 languages, with special focus on Russian. Despite a significantly smaller training budget than comparable models, Gamayun outperforms LLaMA3.2-1B (9T tokens) on all considered benchmarks, and surpasses Qwen2.5-1.5B (18T tokens) on a wide range of English and multilingual tasks. It matches or exceeds Qwen3 (36T tokens) on most tasks outside advanced STEM, achieving state-of-the-art results in Russian, including the MERA benchmark, among the models of comparable size (1-2B parameters).
Community
Hello! Great work, the new Russian pretrain is inspiring! I wanted to know if you plan to release: 1. the model 2. the Rubin dataset?
Hello, @RefalMachine ! Thanks for your interest!
Our supervisors didn't allow us to publish the weights of the current model, unfortunately, because it should better fit our publishing standards and have less copyright issues.
Good news, the second version of the model - trained once more time from scratch - is underway. Although it did require > 1T additional tokens to recover, it should be free of these (meaning, critical ones) issues now. We have also scaled the compute >16 times and it already yields better results both on the benchmarks and in subjective conversations.
As for the RuBIN benchmark, at this moment you can access it by request. We are planning to publish it for all to test upon.
Stay tuned! :3
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper