Ministral-3-14B-DeLERP-0.5-Instruct-Multimodal
A 50/50 DeLERP merge of Ministral 3 14B Base and Instruct models, preserving full multimodal (vision) capabilities.
Model Details
- Base Model: mistralai/Ministral-3-14B-Base-2512
- Target Model: mistralai/Ministral-3-14B-Instruct-2512-BF16
- Architecture: Mistral3ForConditionalGeneration (multimodal)
- Merge Method: DeLERP (Decomposed Linear Interpolation)
- Mix Ratio: 50% Base / 50% Instruct
- Precision: BF16
What is DeLERP?
DeLERP (Decomposed Linear Interpolation) separates the merging of model weights into two components:
- Direction: Uses NLERP (normalized linear interpolation) to blend the directional components of weight tensors
- Magnitude: Preserves the maximum magnitude from either source model to maintain importance signals
This approach can produce more robust merged models compared to simple linear interpolation (LERP/SLERP).
Reference: DeLERP Merge Method
Merge Configuration
- Embedding layers (embed_tokens, lm_head): Copied directly from Instruct model (no merge)
- Vision encoder layers: Merged using DeLERP at 50% mix
- Language model layers: Merged using DeLERP at 50% mix
- Tokenizer/Config: From Instruct model
Evaluation Results (Text-Only Version)
Hey guys, quick note here. Claude wrote this readme but it forgot to mention that it didn't use the right chat template for some of these (I think the 0.9 and the 0.99).
In instruction-following benchmarks comparing DeLERP variants:
| Model | Mean Score | At Least 1 Correct | Complete Failure |
|---|---|---|---|
| DeLERP-0.5 | 55.8% | 88.4% | 11.6% |
| Instruct Baseline | 52.4% | 72.6% | 27.4% |
| DeLERP-0.8 | 52.1% | 75.0% | 25.0% |
| DeLERP-0.9 | 48.6% | 72.6% | 27.4% |
| DeLERP-0.99 | 49.3% | 69.8% | 30.2% |
The 0.5 mix ratio showed:
- Highest mean instruction-following score
- Lowest complete failure rate
- More concise responses (~19% shorter than baseline)
Usage
from transformers import AutoProcessor, Mistral3ForConditionalGeneration
import torch
model_id = "rpDungeon/Ministral-3-14B-DeLERP-0.5-Instruct-Multimodal"
processor = AutoProcessor.from_pretrained(model_id)
model = Mistral3ForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Text-only example
messages = [{"role": "user", "content": "Explain quantum computing in simple terms."}]
inputs = processor.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(outputs[0], skip_special_tokens=True))
License
This model inherits the license from the original Mistral models. Please refer to Mistral AI's licensing terms.
Citation
If you use this model, please cite:
- The original Mistral AI team for the base models
- The DeLERP merge method: https://huggingface.co/blog/grimjim/delerp-merge-method
- Downloads last month
- 27