Instructions to use deepseek-ai/deepseek-coder-6.7b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use deepseek-ai/deepseek-coder-6.7b-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="deepseek-ai/deepseek-coder-6.7b-instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct") model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-6.7b-instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use deepseek-ai/deepseek-coder-6.7b-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "deepseek-ai/deepseek-coder-6.7b-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepseek-ai/deepseek-coder-6.7b-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/deepseek-ai/deepseek-coder-6.7b-instruct
- SGLang
How to use deepseek-ai/deepseek-coder-6.7b-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "deepseek-ai/deepseek-coder-6.7b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepseek-ai/deepseek-coder-6.7b-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "deepseek-ai/deepseek-coder-6.7b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepseek-ai/deepseek-coder-6.7b-instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use deepseek-ai/deepseek-coder-6.7b-instruct with Docker Model Runner:
docker model run hf.co/deepseek-ai/deepseek-coder-6.7b-instruct
Confirming the EOS token? 32021 or 32014? Or both?
Hi
I'm having issues with my GGUF quantisations where the model won't stop generating, and generates endless <|EOT|> tokens.
I made the GGUF with special tokens set as per tokenizer_config.json, ie EOS is set to token ID 32014
But in your README I realised you're actually setting it to 32021 for the Instruct models?
# 32021 is the id of <|EOT|> token
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=32021)
So I just wanted to double check that for Instruct, EOS should be set to 32021 and that tokenizer_config.json is wrong in this regard?
Is there a reason that tokenizer_config.json and config.json don't have EOS set to 32021, but rather to 32014? Would you consider changing that, or do other aspects of model generation depend on 32014?
Thanks
TB
For instruct model, the eos_id is 32021, i.e. <|EOT|> token. For base model, the eos_id is 32014, i.e. . We will reset the eos_id for different models. Thanks for your pointing it.
Great, thank you for confirming that quickly.
I will re-make all my Instruct GGUF files once you've been able to update the tokenizer config.
I have fixed the mistakes in the instruction models. Thanks!
Thanks very much - but could you do tokenizer_config.json also? Or I can do a PR if you like