YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Ministral-3-14B-DeLERP-0.5-Instruct-Multimodal

A 50/50 DeLERP merge of Ministral 3 14B Base and Instruct models, preserving full multimodal (vision) capabilities.

Model Details

Base Model: mistralai/Ministral-3-14B-Base-2512
Target Model: mistralai/Ministral-3-14B-Instruct-2512-BF16
Architecture: Mistral3ForConditionalGeneration (multimodal)
Merge Method: DeLERP (Decomposed Linear Interpolation)
Mix Ratio: 50% Base / 50% Instruct
Precision: BF16

What is DeLERP?

DeLERP (Decomposed Linear Interpolation) separates the merging of model weights into two components:

Direction: Uses NLERP (normalized linear interpolation) to blend the directional components of weight tensors
Magnitude: Preserves the maximum magnitude from either source model to maintain importance signals

This approach can produce more robust merged models compared to simple linear interpolation (LERP/SLERP).

Reference: DeLERP Merge Method

Merge Configuration

Embedding layers (embed_tokens, lm_head): Copied directly from Instruct model (no merge)
Vision encoder layers: Merged using DeLERP at 50% mix
Language model layers: Merged using DeLERP at 50% mix
Tokenizer/Config: From Instruct model

Evaluation Results (Text-Only Version)

Hey guys, quick note here. Claude wrote this readme but it forgot to mention that it didn't use the right chat template for some of these (I think the 0.9 and the 0.99).

In instruction-following benchmarks comparing DeLERP variants:

Model	Mean Score	At Least 1 Correct	Complete Failure
DeLERP-0.5	55.8%	88.4%	11.6%
Instruct Baseline	52.4%	72.6%	27.4%
DeLERP-0.8	52.1%	75.0%	25.0%
DeLERP-0.9	48.6%	72.6%	27.4%
DeLERP-0.99	49.3%	69.8%	30.2%

The 0.5 mix ratio showed:

Highest mean instruction-following score
Lowest complete failure rate
More concise responses (~19% shorter than baseline)

Usage

from transformers import AutoProcessor, Mistral3ForConditionalGeneration
import torch

model_id = "rpDungeon/Ministral-3-14B-DeLERP-0.5-Instruct-Multimodal"

processor = AutoProcessor.from_pretrained(model_id)
model = Mistral3ForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Text-only example
messages = [{"role": "user", "content": "Explain quantum computing in simple terms."}]
inputs = processor.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(outputs[0], skip_special_tokens=True))

License

This model inherits the license from the original Mistral models. Please refer to Mistral AI's licensing terms.

Citation

If you use this model, please cite:

The original Mistral AI team for the base models
The DeLERP merge method: https://huggingface.co/blog/grimjim/delerp-merge-method

Downloads last month: 27

Safetensors

Model size

14B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rpDungeon/Ministral-3-14B-DeLERP-0.5-Instruct

Quantizations

2 models