A newer version of this model is available: MihaiPopa-1/CinnabarLM-4M-Base

CinnabarLM

CinnabarLM is a tiny, 4M-parameter LLM trained for ~33 minutes on a T4 GPU (on Colab)! It's only 16 MB in size!

Why?

Because it's a good idea to make tiny LLMs. Some people already did with MicroLM, Spark 4 5M and Tenete 8M, but not myself!

Model Configurations

Parameter Value
Tokenizer Custom BPE tokenizer
Vocabulary Size 4096 tokens
Batch Size 64
Context Window 256 tokens
n_embed 192
n_head 8
n_layer 6
Dropout 0.1

Training Configurations

Hyperparameter Value
max_iters 10000
eval_interval 500
learning_rate 6e-4
min_lr 6e-5
warmup_iters 500
weight_decay 0.1
beta1, beta2 0.9, 0.95

Limitations

  • Not Instruction-Tuned: It's only a base model, so it only completes text.
  • English-Only: It's trained on English data (FineWeb), it's NOT multilingual.
  • Not a Standard Model: It's NOT a Qwen/Llama/GPT model. Standard Transformers can't recognize this!
  • Preview: This is a preview version, it generates gibberish often. CinnabarLM 1 will solve this with Llama.

Some other details

  • It's trained on 80 million tokens of FineWeb (CC-MAIN-2025-26 snapshot), and the knowledge cutoff is June 2025.
  • The name "CinnabarLM" that I picked was made by combining "Cinnabar" (the new block from the Chaos Cubed drop in Minecraft) + "LM" (Language Model)
Downloads last month
799
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train MihaiPopa-1/CinnabarLM-4M-Base-Preview