Instructions to use rob-x-ai/phi-2-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use rob-x-ai/phi-2-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="rob-x-ai/phi-2-GGUF", filename="ggml-model-f16.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use rob-x-ai/phi-2-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rob-x-ai/phi-2-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rob-x-ai/phi-2-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rob-x-ai/phi-2-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rob-x-ai/phi-2-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf rob-x-ai/phi-2-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf rob-x-ai/phi-2-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf rob-x-ai/phi-2-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf rob-x-ai/phi-2-GGUF:Q4_K_M
Use Docker
docker model run hf.co/rob-x-ai/phi-2-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use rob-x-ai/phi-2-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rob-x-ai/phi-2-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rob-x-ai/phi-2-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/rob-x-ai/phi-2-GGUF:Q4_K_M
- Ollama
How to use rob-x-ai/phi-2-GGUF with Ollama:
ollama run hf.co/rob-x-ai/phi-2-GGUF:Q4_K_M
- Unsloth Studio new
How to use rob-x-ai/phi-2-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rob-x-ai/phi-2-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rob-x-ai/phi-2-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rob-x-ai/phi-2-GGUF to start chatting
- Docker Model Runner
How to use rob-x-ai/phi-2-GGUF with Docker Model Runner:
docker model run hf.co/rob-x-ai/phi-2-GGUF:Q4_K_M
- Lemonade
How to use rob-x-ai/phi-2-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull rob-x-ai/phi-2-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.phi-2-GGUF-Q4_K_M
List all available models
lemonade list
How to create in Ollama??
2023/12/16 21:21:42 parser.go:62: WARNING: Unknown command: [PAD51186]
2023/12/16 21:21:42 parser.go:62: WARNING: Unknown command: [PAD51187]
2023/12/16 21:21:42 parser.go:62: WARNING: Unknown command: [PAD51188]
2023/12/16 21:21:42 parser.go:62: WARNING: Unknown command: [PAD51189]
2023/12/16 21:21:42 parser.go:62: WARNING: Unknown command: [PAD51190]
2023/12/16 21:21:42 parser.go:62: WARNING: Unknown command: [PAD51191]
2023/12/16 21:21:42 parser.go:62: WARNING: Unknown command: [PAD51192]
2023/12/16 21:21:42 parser.go:62: WARNING: Unknown command: [PAD51193]
2023/12/16 21:21:42 parser.go:62: WARNING: Unknown command: [PAD51194]
2023/12/16 21:21:42 parser.go:62: WARNING: Unknown command: [PAD51195]
2023/12/16 21:21:42 parser.go:62: WARNING: Unknown command: [PAD51196]
2023/12/16 21:21:42 parser.go:62: WARNING: Unknown command: [PAD51197]
2023/12/16 21:21:42 parser.go:62: WARNING: Unknown command: [PAD51198]
Error: no FROM line for the model was specified
This is the error i am getting when i run ollama create phi-2-q4 -f ./phi-2_Q8_0.gguf
Any idea to create it
Some modification will need to be done as this isn't yet merged into llamacpp : https://github.com/mrgraycode/llama.cpp/commit/12cc80cb8975aea3bc9f39d3c9b84f7001ab94c5#diff-150dc86746a90bad4fc2c3334aeb9b5887b3adad3cc1459446717638605348efR6239 but you can fork it.
Yeah but it is not working in ollama
Support has been added to llama.cpp master so the ball is on Ollama now.
https://github.com/ggerganov/llama.cpp/commit/b9e74f9bca5fdf7d0a22ed25e7a9626335fdfa48
LM Studio Beta is updated.
So I am trying there now!
Thanks!!
I find it interesting for a only 3b parameters model you will soon be able to run anywhere. It won't do math or you prolly would have to implement a Chain of Thought in the prompts or external tools after processing.
@namankhator : thanks for the feedback! Please recall that this is a base completion model, so the format of your question really matters. When you give instruction I recommend using the format:
Instruct: YOUR INSTRUCTION
Output:
Moreover, for any kind of reasoning it's useful to add "Let's think step by step", even for easy questions. If you do both of those things, it works for your example.
Hey @sebubeck
Thanks for the recommendations.
I believe Instruct and Output are already set. (attached image from LM Studio)
I tried to use the prompt you asked but it still did not work.
I will try for tasks other than reasoning, and if need be will update.




