Instructions to use wolfram/miqu-1-103b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use wolfram/miqu-1-103b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="wolfram/miqu-1-103b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("wolfram/miqu-1-103b") model = AutoModelForCausalLM.from_pretrained("wolfram/miqu-1-103b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use wolfram/miqu-1-103b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "wolfram/miqu-1-103b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wolfram/miqu-1-103b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/wolfram/miqu-1-103b
- SGLang
How to use wolfram/miqu-1-103b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "wolfram/miqu-1-103b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wolfram/miqu-1-103b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "wolfram/miqu-1-103b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wolfram/miqu-1-103b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use wolfram/miqu-1-103b with Docker Model Runner:
docker model run hf.co/wolfram/miqu-1-103b
Please use the correct prompt template
I think the prompt template you supplied in your readme is wrong. Using the correct template for miqu makes a huge difference. Please see my post on reddit for more details.
The prompt should be:
[INST] {System}[/INST][INST] {User}[/INST] {Assistant}
https://www.reddit.com/r/LocalLLaMA/comments/1b1gxmq/the_definite_correct_miqu_prompt/
Just saw this linked from Reddit and wanted to say there is another prompt format that I found too:
[INST] {prompt1}
[\INST]{response1}[INST] {prompt2}
[\INST]{response2}
I originally got it by asking Miquabout the formathe saw during training, starting with a blank prompt template and restarting each time he got confused (he seems to find it hard to write "[\INST]" more than a couple of times...).
Main discussion here: https://huggingface.co/miqudev/miqu-1-70b/discussions/25
(I've tried a few ways to add the system prompt which I outline in that thread).
I've also confirmed this is the prompt format he starts to hallucinate when you do a merge with Codellama-70b, eg:
[INST] hi
[/INST] hello[INST] my name is...
[/INST] nice to meet you...
Interestingly he will also often end his responses with "Confidence X%" if you use the suggested template!
I tried the prompt format the OP suggested and it performs much worse for me on coding tasks and seems, but I can't rule out bugs in Ollama templating or the wrapped llama.cpp server. Just about any other prompt template than the one I suggested makes him much worse at coding tasks overall, but with the correct/suggested template he is actually one of the best for task like refactoring.
Another good test I found is to ask it about some obscure machine learning papers from the 80s and 90s - Miqu is the only 70b model that seems to be (properly) trained on these. If you use the wrong prompt format he will quite confidently get mixed up and use the wrong names, etc.
eg: try asking in a round about way about the paper:
"Recursive distributed representations", J. B. Pollack - Artificial Intelligence 46 (1-2):77-105 (1990)
It's a particularly good one to ask about as there was lots of other research around the same time that used similar ideas/names and all the other 70b models seem to have some kind of "superimposed" version of the facts, but with the correct/suggested prompt format Miquknows the J stands for Jordan and can tell you quite well what this paper is about.