YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Ministral-3-14B-DeLERP-0.5-Instruct-Multimodal

A 50/50 DeLERP merge of Ministral 3 14B Base and Instruct models, preserving full multimodal (vision) capabilities.

Model Details

What is DeLERP?

DeLERP (Decomposed Linear Interpolation) separates the merging of model weights into two components:

  1. Direction: Uses NLERP (normalized linear interpolation) to blend the directional components of weight tensors
  2. Magnitude: Preserves the maximum magnitude from either source model to maintain importance signals

This approach can produce more robust merged models compared to simple linear interpolation (LERP/SLERP).

Reference: DeLERP Merge Method

Merge Configuration

  • Embedding layers (embed_tokens, lm_head): Copied directly from Instruct model (no merge)
  • Vision encoder layers: Merged using DeLERP at 50% mix
  • Language model layers: Merged using DeLERP at 50% mix
  • Tokenizer/Config: From Instruct model

Evaluation Results (Text-Only Version)

Hey guys, quick note here. Claude wrote this readme but it forgot to mention that it didn't use the right chat template for some of these (I think the 0.9 and the 0.99).

In instruction-following benchmarks comparing DeLERP variants:

Model Mean Score At Least 1 Correct Complete Failure
DeLERP-0.5 55.8% 88.4% 11.6%
Instruct Baseline 52.4% 72.6% 27.4%
DeLERP-0.8 52.1% 75.0% 25.0%
DeLERP-0.9 48.6% 72.6% 27.4%
DeLERP-0.99 49.3% 69.8% 30.2%

The 0.5 mix ratio showed:

  • Highest mean instruction-following score
  • Lowest complete failure rate
  • More concise responses (~19% shorter than baseline)

Usage

from transformers import AutoProcessor, Mistral3ForConditionalGeneration
import torch

model_id = "rpDungeon/Ministral-3-14B-DeLERP-0.5-Instruct-Multimodal"

processor = AutoProcessor.from_pretrained(model_id)
model = Mistral3ForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Text-only example
messages = [{"role": "user", "content": "Explain quantum computing in simple terms."}]
inputs = processor.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(outputs[0], skip_special_tokens=True))

License

This model inherits the license from the original Mistral models. Please refer to Mistral AI's licensing terms.

Citation

If you use this model, please cite:

Downloads last month
27
Safetensors
Model size
14B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for rpDungeon/Ministral-3-14B-DeLERP-0.5-Instruct

Quantizations
2 models