Instructions to use Gryphe/Pantheon-RP-1.5-12b-Nemo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Gryphe/Pantheon-RP-1.5-12b-Nemo with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Gryphe/Pantheon-RP-1.5-12b-Nemo") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Gryphe/Pantheon-RP-1.5-12b-Nemo") model = AutoModelForCausalLM.from_pretrained("Gryphe/Pantheon-RP-1.5-12b-Nemo") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Gryphe/Pantheon-RP-1.5-12b-Nemo with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Gryphe/Pantheon-RP-1.5-12b-Nemo" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Gryphe/Pantheon-RP-1.5-12b-Nemo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Gryphe/Pantheon-RP-1.5-12b-Nemo
- SGLang
How to use Gryphe/Pantheon-RP-1.5-12b-Nemo with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Gryphe/Pantheon-RP-1.5-12b-Nemo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Gryphe/Pantheon-RP-1.5-12b-Nemo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Gryphe/Pantheon-RP-1.5-12b-Nemo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Gryphe/Pantheon-RP-1.5-12b-Nemo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Gryphe/Pantheon-RP-1.5-12b-Nemo with Docker Model Runner:
docker model run hf.co/Gryphe/Pantheon-RP-1.5-12b-Nemo
my thoughts
I have used a few Nemo models. The base ,Atlantis, dory and a few others. I was gonna go back to the base one but it had so many little issues. But the context and size and speed was all too good to try another. Then came this model, I can't explain how good this one followed prompts but had so much character in it's responses. The only issue I can think of and it's not much of an issue, is that I need it to not give so much and shorten a little it goes past what I want by a good bit a lot of the time. All in all tho this model is fantastic and it's great how I say be dark and gritty and I don't end up with and we all end up happy and sniffing flowers on a sunny day