Instructions to use Gryphe/Pantheon-RP-1.5-12b-Nemo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Gryphe/Pantheon-RP-1.5-12b-Nemo with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Gryphe/Pantheon-RP-1.5-12b-Nemo")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Gryphe/Pantheon-RP-1.5-12b-Nemo")
model = AutoModelForCausalLM.from_pretrained("Gryphe/Pantheon-RP-1.5-12b-Nemo")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Gryphe/Pantheon-RP-1.5-12b-Nemo with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Gryphe/Pantheon-RP-1.5-12b-Nemo"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Gryphe/Pantheon-RP-1.5-12b-Nemo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Gryphe/Pantheon-RP-1.5-12b-Nemo

SGLang

How to use Gryphe/Pantheon-RP-1.5-12b-Nemo with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Gryphe/Pantheon-RP-1.5-12b-Nemo" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Gryphe/Pantheon-RP-1.5-12b-Nemo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Gryphe/Pantheon-RP-1.5-12b-Nemo" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Gryphe/Pantheon-RP-1.5-12b-Nemo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Gryphe/Pantheon-RP-1.5-12b-Nemo with Docker Model Runner:
```
docker model run hf.co/Gryphe/Pantheon-RP-1.5-12b-Nemo
```

my thoughts

by Ardvark123 - opened Aug 8, 2024

Discussion

Ardvark123

Aug 8, 2024

I have used a few Nemo models. The base ,Atlantis, dory and a few others. I was gonna go back to the base one but it had so many little issues. But the context and size and speed was all too good to try another. Then came this model, I can't explain how good this one followed prompts but had so much character in it's responses. The only issue I can think of and it's not much of an issue, is that I need it to not give so much and shorten a little it goes past what I want by a good bit a lot of the time. All in all tho this model is fantastic and it's great how I say be dark and gritty and I don't end up with and we all end up happy and sniffing flowers on a sunny day

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment