bdellabe commited on
Commit
a5c5340
·
verified ·
1 Parent(s): fa85889

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -1
README.md CHANGED
@@ -36,4 +36,28 @@ FP8:
36
  |------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
37
  |gsm8k_platinum_cot_llama| 3|flexible-extract| 5|exact_match|↑ |0.9768|± |0.0043|
38
  | | |strict-match | 5|exact_match|↑ |0.9777|± |0.0043|
39
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  |------------------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
37
  |gsm8k_platinum_cot_llama| 3|flexible-extract| 5|exact_match|↑ |0.9768|± |0.0043|
38
  | | |strict-match | 5|exact_match|↑ |0.9777|± |0.0043|
39
+ ```
40
+
41
+ ## Creation
42
+
43
+ This model was created by applying data-free FP8 Dynamic quantization with [LLM Compressor](https://github.com/vllm-project/llm-compressor), as presented in the code snippet below.
44
+
45
+ <details>
46
+
47
+ ```python
48
+ from llmcompressor import model_free_ptq
49
+
50
+ MODEL_ID = "google/gemma-4-31B-it"
51
+ SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-block"
52
+
53
+ model_free_ptq(
54
+ model_stub=MODEL_ID,
55
+ save_directory=SAVE_DIR,
56
+ scheme="FP8_DYNAMIC",
57
+ ignore=["re:.*vision.*", "lm_head", "re:.*embed_tokens.*"],
58
+ max_workers=8,
59
+ device="cuda:0",
60
+ )
61
+ ```
62
+
63
+ </details>