Update README.md
Browse files
README.md
CHANGED
|
@@ -36,4 +36,28 @@ FP8:
|
|
| 36 |
|------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|
| 37 |
|gsm8k_platinum_cot_llama| 3|flexible-extract| 5|exact_match|↑ |0.9768|± |0.0043|
|
| 38 |
| | |strict-match | 5|exact_match|↑ |0.9777|± |0.0043|
|
| 39 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|
| 37 |
|gsm8k_platinum_cot_llama| 3|flexible-extract| 5|exact_match|↑ |0.9768|± |0.0043|
|
| 38 |
| | |strict-match | 5|exact_match|↑ |0.9777|± |0.0043|
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
## Creation
|
| 42 |
+
|
| 43 |
+
This model was created by applying data-free FP8 Dynamic quantization with [LLM Compressor](https://github.com/vllm-project/llm-compressor), as presented in the code snippet below.
|
| 44 |
+
|
| 45 |
+
<details>
|
| 46 |
+
|
| 47 |
+
```python
|
| 48 |
+
from llmcompressor import model_free_ptq
|
| 49 |
+
|
| 50 |
+
MODEL_ID = "google/gemma-4-31B-it"
|
| 51 |
+
SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-block"
|
| 52 |
+
|
| 53 |
+
model_free_ptq(
|
| 54 |
+
model_stub=MODEL_ID,
|
| 55 |
+
save_directory=SAVE_DIR,
|
| 56 |
+
scheme="FP8_DYNAMIC",
|
| 57 |
+
ignore=["re:.*vision.*", "lm_head", "re:.*embed_tokens.*"],
|
| 58 |
+
max_workers=8,
|
| 59 |
+
device="cuda:0",
|
| 60 |
+
)
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
</details>
|