Instructions to use TheBloke/Llama-2-7B-Chat-GGML with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TheBloke/Llama-2-7B-Chat-GGML with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TheBloke/Llama-2-7B-Chat-GGML")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("TheBloke/Llama-2-7B-Chat-GGML", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use TheBloke/Llama-2-7B-Chat-GGML with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TheBloke/Llama-2-7B-Chat-GGML" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheBloke/Llama-2-7B-Chat-GGML", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TheBloke/Llama-2-7B-Chat-GGML
- SGLang
How to use TheBloke/Llama-2-7B-Chat-GGML with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TheBloke/Llama-2-7B-Chat-GGML" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheBloke/Llama-2-7B-Chat-GGML", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TheBloke/Llama-2-7B-Chat-GGML" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheBloke/Llama-2-7B-Chat-GGML", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TheBloke/Llama-2-7B-Chat-GGML with Docker Model Runner:
docker model run hf.co/TheBloke/Llama-2-7B-Chat-GGML
Request: DOI
1
#48 opened about 1 year ago
by
sdf8
Request: give permission to use meta llama model
#46 opened over 1 year ago
by
maitrijain
Request: DOI
#45 opened over 1 year ago
by
maitrijain
Quantize model
2
#43 opened almost 2 years ago
by
Kunalbhagat88
error loading model: unrecognized tensor type 10
#42 opened about 2 years ago
by
ashhar-01
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack
1
#41 opened over 2 years ago
by
swathiKonakanchi
Can't git clone repo's contents
1
#40 opened over 2 years ago
by
s3nd41
OSError: TheBloke/Llama-2-7B-Chat-GGML does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
1
#39 opened over 2 years ago
by
SJay747
Program terminated while giving multiple request at a time
1
#38 opened over 2 years ago
by
krishnapiya
Could not load Llama model from path
1
#37 opened over 2 years ago
by
Lozzoya
How to configure with TGI?
4
#34 opened over 2 years ago
by
luissimoes
Add precise license metadata as part of Hacktoberfest 2023
#33 opened over 2 years ago
by
abhicodes
terminate called after throwing an instance of 'std::runtime_error' | what(): unexpectedly reached end of file | Aborted (core dumped)
#31 opened over 2 years ago
by
baobirdy
Traceback about ui_model_menu.py
1
#30 opened over 2 years ago
by
AAAkuan
Can you fine-tune Llama-2-7B-Chat-GGML and other quantized version of llama?
3
#28 opened almost 3 years ago
by
obscureagent
Help to deploy it ??
#25 opened almost 3 years ago
by
deepakkaura26
[AUTOMATED] Model Memory Requirements
#24 opened almost 3 years ago
by
model-sizer-bot
Prompts for Question Answering Assistant
3
#21 opened almost 3 years ago
by
pratikhublikar
Why does llama-2-7b-chat.ggmlv3.q2_K.bin always load using Metal?
#19 opened almost 3 years ago
by
Auxon
Model giving out weird responses
3
#18 opened almost 3 years ago
by
kiran2405
updated README.md on how to make changes
#17 opened almost 3 years ago
by
absy
How to convert model into GGML format?
❤️👍 4
54
#13 opened almost 3 years ago
by
zbruceli
How to change model in config.yml file?
#12 opened almost 3 years ago
by
ianuvrat
Tokenizer behaving differently than Meta's original.
2
#5 opened almost 3 years ago
by
viniciusarruda
Benchmark of different GGML version
2
#2 opened almost 3 years ago
by
aiapprentice101
Damn exciting this is!
🤯👍 3
1
#1 opened almost 3 years ago
by
mechanicmuthu