Voxtral Mini 4B Realtime - 8-bit MLX

This is an 8-bit quantized MLX version of Voxtral Mini 4B Realtime by Mistral AI, converted using voxmlx.

This version was created for use with Supervoxtral, enabling blazingly-fast realtime transcription on MacOS.

Model Details

Base model: mistralai/Voxtral-Mini-4B-Realtime-2602
Quantization: 8-bit (group size 64)
Framework: MLX
Parameters: ~4B (3.4B language model + 970M audio encoder)
License: Apache 2.0

Description

Voxtral Mini is a speech-to-text model that supports 13+ languages with sub-500ms latency. This version has been quantized to 8-bit precision for efficient inference on Apple Silicon using the MLX framework.

Credits

Original model by Mistral AI
MLX conversion tooling by voxmlx

Downloads last month: 43

Safetensors

Model size

1B params

Tensor type

BF16

U32

MLX

Hardware compatibility

Quantized

Model tree for ellamind/Voxtral-Mini-4B-Realtime-8bit-mlx

Base model

mistralai/Ministral-3-3B-Base-2512

Finetuned

mistralai/Voxtral-Mini-4B-Realtime-2602

Finetuned

(18)

this model