Instructions to use ruslanmv/granite-3.1-2b-Reasoning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ruslanmv/granite-3.1-2b-Reasoning with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ruslanmv/granite-3.1-2b-Reasoning")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ruslanmv/granite-3.1-2b-Reasoning")
model = AutoModelForCausalLM.from_pretrained("ruslanmv/granite-3.1-2b-Reasoning")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ruslanmv/granite-3.1-2b-Reasoning with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ruslanmv/granite-3.1-2b-Reasoning"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ruslanmv/granite-3.1-2b-Reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ruslanmv/granite-3.1-2b-Reasoning

SGLang

How to use ruslanmv/granite-3.1-2b-Reasoning with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ruslanmv/granite-3.1-2b-Reasoning" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ruslanmv/granite-3.1-2b-Reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ruslanmv/granite-3.1-2b-Reasoning" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ruslanmv/granite-3.1-2b-Reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ruslanmv/granite-3.1-2b-Reasoning with Docker Model Runner:
```
docker model run hf.co/ruslanmv/granite-3.1-2b-Reasoning
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Granite-3.1-2B-Reasoning (Fine-tuned for Logical Reasoning)

Model Overview

This model is a fine-tuned version of ibm-granite/granite-3.1-2b-instruct, specifically optimized for enhanced reasoning capabilities. Fine-tuning has been conducted to improve its performance on logical reasoning, structured problem-solving, and complex analytical tasks.

Developed by: ruslanmv
License: Apache 2.0
Base Model: ibm-granite/granite-3.1-2b-instruct
Fine-tuned for: Logical reasoning, structured problem-solving, long-context tasks
Supported Languages: English

Model Summary

Granite-3.1-2B-Reasoning is part of IBM’s Granite 3.1 language model series, which supports extended context lengths and strong multi-domain performance. This fine-tuned variant enhances the model's ability to process complex reasoning tasks efficiently.

Improvements Over Base Model:

✅ Improved reasoning and problem-solving skills
✅ Optimized for instruction-following and logical deduction
✅ Maintains the efficiency and robustness of Granite-3.1

Installation & Usage

Install the required dependencies:

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

Running the Model

Use the following Python snippet to load and generate text with the fine-tuned model:

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch

# Model and tokenizer
model_name = "ruslanmv/granite-3.1-2b-Reasoning" 
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map='auto', # or 'cuda' if you have only one GPU
    torch_dtype=torch.float16, # Use float16 for faster and less memory intensive inference
    load_in_4bit=True # Enable 4-bit quantization for lower memory usage - requires bitsandbytes
)

# Prepare dataset
SYSTEM_PROMPT = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""
text = tokenizer.apply_chat_template([
    {"role" : "system", "content" : SYSTEM_PROMPT},
    {"role" : "user", "content" : "Calculate pi."},
], tokenize = False, add_generation_prompt = True)

inputs = tokenizer(text, return_tensors="pt").to("cuda") # Move input tensor to GPU

# Sampling parameters
generation_config = GenerationConfig(
    temperature = 0.8,
    top_p = 0.95,
    max_new_tokens = 1024, # Equivalent to max_tokens in the original code, but for generation
)

# Inference
with torch.inference_mode(): # Use inference mode for faster generation
    outputs = model.generate(**inputs, generation_config=generation_config)

output = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Find the start of the actual response
start_index = output.find("assistant")
if start_index != -1:
    # Remove the initial part including "assistant"
    output = output[start_index + len("assistant"):].strip()

print(output)

and the output is :

<reasoning>
Pi is an irrational number, which means it cannot be precisely calculated using finite decimal or fractional notation. It is typically represented by the Greek letter π and its approximate value is 3.14159. However, for a more precise calculation, we can use mathematical algorithms like the Leibniz formula for π or the Gregory-Leibniz series.

The Leibniz formula for π is:

π = 4 * (1 - 1/3 + 1/5 - 1/7 + 1/9 - 1/11 + 1/13 - 1/15 +...)

This series converges slowly, so many terms are needed for a good approximation. For instance, using 10 terms, the approximation would be:

π ≈ 4 * (1 - 0.3333333333333333 + 0.1111111111111111 - 0.0344827586206897 + 0.0090040875518672 - 0.0025958422650073 + 0.0006929403729561 - 0.0001866279043531 + 0.0000499753694946 - 0.0000133386323746 + 0.0000035303398593 - 0.0000009009433996)

π ≈ 3.141592653589793

This is a rough approximation of π using 10 terms. For a more precise value, you can use more terms or employ other algorithms.

</reasoning>

<answer>
π ≈ 3.141592653589793
</answer>

Intended Use

Granite-3.1-2B-Reasoning is designed for tasks requiring structured reasoning, including:

Logical and analytical problem-solving
Text-based reasoning tasks
Mathematical and symbolic reasoning
Advanced instruction-following

License & Acknowledgments

This model is released under the Apache 2.0 license. It is fine-tuned from IBM’s Granite 3.1-2B-Instruct model. Special thanks to the IBM Granite Team for developing the base model.

For more details, visit the IBM Granite Documentation.

Citation

If you use this model in your research or applications, please cite:

@misc{ruslanmv2025granite,
  title={Fine-Tuning Granite-3.1 for Advanced Reasoning},
  author={Ruslan M.V.},
  year={2025},
  url={https://huggingface.co/ruslanmv/granite-3.1-2b-Reasoning}
}

Downloads last month: 9

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for ruslanmv/granite-3.1-2b-Reasoning

Base model

ibm-granite/granite-3.1-2b-base

Finetuned

ibm-granite/granite-3.1-2b-instruct

Finetuned

(13)

this model

Quantizations

2 models