--- base_model: - Darkhn/Magistral-Small-2509-Text-Only --- fp8 w8a8 quant of Darkhn/Magistral-Small-2509-Text-Only b/c vllm seems to take issue with the pixtral vision setup for me. All thanks to Darkhn/Magistral-Small-2509-Text-Only for uploading the no vision checkpoint. Recipe: ```python from transformers import AutoTokenizer, AutoModelForCausalLM MODEL_ID = "Darkhn/Magistral-Small-2509-Text-Only" model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype="auto") tokenizer = AutoTokenizer.from_pretrained(MODEL_ID) from llmcompressor import oneshot from llmcompressor.modifiers.quantization import QuantizationModifier # Configure the simple PTQ quantization recipe = QuantizationModifier( targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"]) # Apply the quantization algorithm. oneshot(model=model, recipe=recipe) # Save the model. SAVE_DIR = MODEL_ID.rstrip("/").split("/")[-1] + "-FP8-Dynamic" model.save_pretrained(SAVE_DIR) tokenizer.save_pretrained(SAVE_DIR) ```