bdellabe commited on
Commit
743e397
·
verified ·
1 Parent(s): d620f6d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - moonshotai/Kimi-K2.6
4
+ tags:
5
+ - kimi
6
+ - nvfp4
7
+ - vllm
8
+ - compressed-tensors
9
+ name: RedHatAI/Kimi-K2.6-NVFP4
10
+ ---
11
+
12
+ # NVFP4-Quantized RedHatAI/Kimi-K2.6-NVFP4
13
+
14
+ This is a preliminary version (and subject to change) of NVFP4-quantized [moonshotai/Kimi-K2.6](https://huggingface.co/Kimi/Kimi-K2.6) model, for performant inference on NVIDIA Blackwell GPUs.
15
+ The model has both weights and activations quantized in NVFP4 format with [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor).
16
+
17
+ It is compatible and tested against vllm v0.20.0. Deploy it via `vllm serve` using the recipes at https://recipes.vllm.ai/moonshotai/Kimi-K2.6.
18
+
19
+ # Creation Script:
20
+
21
+ <!-- Run this script with LLM Compressor main and latest transformers. -->
22
+ Kimi K2.6 support will land in https://github.com/vllm-project/llm-compressor/pull/2662. The script to create will be posted as a link shortly
23
+
24
+
25
+ # Preliminary Evaluations
26
+
27
+ 1) GSM8K Platinum:
28
+ ```
29
+ lm_eval --model local-chat-completions \
30
+ --tasks gsm8k_platinum_cot_llama \
31
+ --model_args "model=RedHatAI/Kimi-K2.6-NVFP4,max_length=262144,base_url=http://0.0.0.0:8000/v1/chat/completions,num_concurrent=128,max_retries=3,tokenized_requests=False,tokenizer_backend=None,timeout=1200" \
32
+ --num_fewshot 0 \
33
+ --apply_chat_template \
34
+ --gen_kwargs "do_sample=True,temperature=1.0,top_p=0.95,top_k=20,min_p=0.0,max_gen_toks=64000,presence_penalty=1.5,repetition_penalty=1.0,seed=5678"
35
+
36
+ ```
37
+
38
+ Recovery:
39
+
40
+ | | moonshotai/Kimi-K2.6<br> (original in W4A16) | RedHatAI/Kimi-K2.6-NVFP4<br> (this model) |
41
+ | -------- | :--------------------: | :------------------------------------: |
42
+ | Accuracy<br> | 94.29 | 93.96 |
43
+ | Recovery | \- | 99.6% |
44
+
45
+
46
+ **Note**: More rigorous evaluations are currently in progress and will be available soon.
47
+
48
+