Automatic Speech Recognition
MLX
Safetensors
speech-to-text
audio
realtime
voxtral
quantized
8bit
🇪🇺 Region: EU
Instructions to use ellamind/Voxtral-Mini-4B-Realtime-8bit-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use ellamind/Voxtral-Mini-4B-Realtime-8bit-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Voxtral-Mini-4B-Realtime-8bit-mlx ellamind/Voxtral-Mini-4B-Realtime-8bit-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
Voxtral Mini 4B Realtime - 8-bit MLX
This is an 8-bit quantized MLX version of Voxtral Mini 4B Realtime by Mistral AI, converted using voxmlx.
This version was created for use with Supervoxtral, enabling blazingly-fast realtime transcription on MacOS.
Model Details
- Base model: mistralai/Voxtral-Mini-4B-Realtime-2602
- Quantization: 8-bit (group size 64)
- Framework: MLX
- Parameters: ~4B (3.4B language model + 970M audio encoder)
- License: Apache 2.0
Description
Voxtral Mini is a speech-to-text model that supports 13+ languages with sub-500ms latency. This version has been quantized to 8-bit precision for efficient inference on Apple Silicon using the MLX framework.
Credits
- Original model by Mistral AI
- MLX conversion tooling by voxmlx
- Downloads last month
- 43
Model size
1B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
Quantized
Model tree for ellamind/Voxtral-Mini-4B-Realtime-8bit-mlx
Base model
mistralai/Ministral-3-3B-Base-2512 Finetuned
mistralai/Voxtral-Mini-4B-Realtime-2602