RedHatAI
/

gemma-4-31B-it-FP8-Dynamic

compressed-tensors

Model card Files Files and versions

bdellabe commited on Apr 13

Commit

a5c5340

·

verified ·

1 Parent(s): fa85889

Update README.md

Files changed (1) hide show

README.md +25 -1

README.md CHANGED Viewed

@@ -36,4 +36,28 @@ FP8:
 |------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
 |gsm8k_platinum_cot_llama|      3|flexible-extract|     5|exact_match|↑  |0.9768|±  |0.0043|
 |                        |       |strict-match    |     5|exact_match|↑  |0.9777|±  |0.0043|
-```

 |------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
 |gsm8k_platinum_cot_llama|      3|flexible-extract|     5|exact_match|↑  |0.9768|±  |0.0043|
 |                        |       |strict-match    |     5|exact_match|↑  |0.9777|±  |0.0043|
+```
+## Creation
+This model was created by applying data-free FP8 Dynamic quantization with [LLM Compressor](https://github.com/vllm-project/llm-compressor), as presented in the code snippet below.
+<details>
+```python
+from llmcompressor import model_free_ptq
+MODEL_ID = "google/gemma-4-31B-it"
+SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-block"
+model_free_ptq(
+    model_stub=MODEL_ID,
+    save_directory=SAVE_DIR,
+    scheme="FP8_DYNAMIC",
+    ignore=["re:.*vision.*", "lm_head", "re:.*embed_tokens.*"],
+    max_workers=8,
+    device="cuda:0",
+)
+```
+</details>