Instructions to use cerebras/MiniMax-M2-REAP-162B-A10B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cerebras/MiniMax-M2-REAP-162B-A10B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cerebras/MiniMax-M2-REAP-162B-A10B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cerebras/MiniMax-M2-REAP-162B-A10B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("cerebras/MiniMax-M2-REAP-162B-A10B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use cerebras/MiniMax-M2-REAP-162B-A10B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cerebras/MiniMax-M2-REAP-162B-A10B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cerebras/MiniMax-M2-REAP-162B-A10B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/cerebras/MiniMax-M2-REAP-162B-A10B

SGLang

How to use cerebras/MiniMax-M2-REAP-162B-A10B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cerebras/MiniMax-M2-REAP-162B-A10B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cerebras/MiniMax-M2-REAP-162B-A10B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cerebras/MiniMax-M2-REAP-162B-A10B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cerebras/MiniMax-M2-REAP-162B-A10B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use cerebras/MiniMax-M2-REAP-162B-A10B with Docker Model Runner:
```
docker model run hf.co/cerebras/MiniMax-M2-REAP-162B-A10B
```

We need 50 or 60% expert pruning please

by hxssgaa - opened Nov 15, 2025

Discussion

hxssgaa

Nov 15, 2025

In order to run this on a single 96GB H200 or Rtx Pro 6000 with 4 bit quantisation after expert pruning would be very useful. Don’t mind sacrificing a little more performance.

ha1ry

Nov 15, 2025

•

edited Nov 15, 2025

You can fit 30% IQ4_XS in it.

hxssgaa

Nov 15, 2025

•

edited Nov 15, 2025

You can fit 30% IQ4_XS on it.

Model weights take about 80G, 16G left on KV cache a bit resource tight for long horizon tasks like claude code which easily require 64K context

0xSero

Nov 17, 2025

Quantize kv_cache

lazarevich

Cerebras org Nov 17, 2025

•

edited Nov 17, 2025

@hxssgaa @ha1ry @0xSero hey folks, we just dropped a 40% REAP: https://huggingface.co/cerebras/MiniMax-M2-REAP-139B-A10B
we do see a slightly bigger drop of a few percentage points on some benchmarks, please let us know if you see issues with the model!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment