Instructions to use QuantFactory/Qwen2-7B-BD-RP-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use QuantFactory/Qwen2-7B-BD-RP-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="QuantFactory/Qwen2-7B-BD-RP-GGUF", filename="Qwen2-7B-BD-RP.Q2_K.gguf", )
llm.create_chat_completion( messages = "{\n \"question\": \"What is my name?\",\n \"context\": \"My name is Clara and I live in Berkeley.\"\n}" ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use QuantFactory/Qwen2-7B-BD-RP-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M
Use Docker
docker model run hf.co/QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use QuantFactory/Qwen2-7B-BD-RP-GGUF with Ollama:
ollama run hf.co/QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M
- Unsloth Studio new
How to use QuantFactory/Qwen2-7B-BD-RP-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for QuantFactory/Qwen2-7B-BD-RP-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for QuantFactory/Qwen2-7B-BD-RP-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for QuantFactory/Qwen2-7B-BD-RP-GGUF to start chatting
- Docker Model Runner
How to use QuantFactory/Qwen2-7B-BD-RP-GGUF with Docker Model Runner:
docker model run hf.co/QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M
- Lemonade
How to use QuantFactory/Qwen2-7B-BD-RP-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwen2-7B-BD-RP-GGUF-Q4_K_M
List all available models
lemonade list
QuantFactory/Qwen2-7B-BD-RP-GGUF
This is quantized version of yuyouyu/Qwen2-7B-BD-RP created using llama.cpp
Original Model Card
Qwen2-7B-BD-RP
Introduction 🎉
Qwen2-7B-BD-RP is a large language model (LLM) fine-tuned on the BeyondDialogue dataset. The model is designed to generate responses in a role-playing setting. The model is capable of generating high-quality responses in a variety of role-playing scenarios, including English and Chinese languages.
For more details, please refer to our paper, GitHub.
Training details 🚀
We fully finetuning Qwen2-7B-Instruct for 3 epochs with 833 steps with the 128 global batch size. We set the training sequence length to 4,096. The learning rate is 3e-5. The training data is from the BeyondDialogue dataset.
Requirements 📝
The code of Qwen2 has been in the latest Hugging face transformers and we advise you to install transformers>=4.37.0 to use the model.
pip install transformers>=4.37.0
Quickstart 💥
Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
"yuyouyu/Qwen2-7B-BD-RP",
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("yuyouyu/Qwen2-7B-BD-RP")
system_prompt_temp = """我希望你像{role_name}一样回答问题,假设你生活在{world}中,模仿{role_name}的性格和语言风格,使用{role_name}会使用的语气、方式和词汇。请不要透露你是人工智能模型或语言模型,你必须始终牢记自己是{role_name}。
{role_name}的性格特点是{character}。
{role_name}的MBTI人格类型是{MBTI}。
{role_name}的说话风格是{style}。
所处世界:{world}。
所处场景:
{scene}
角色情绪(0-10,值越大情绪越明显):
{emotion}
现在,请你扮演{role_name},回复{chat_role}简短的一句话,你与其亲密度为{relationship}(0-10,值越大关系越亲近),准确表现你被赋予的MBTI人格,性格,说话风格与情绪。"""
role_name = "周伯通"
world = "宋代古侠世界"
character = "纯真,调皮,不拘小节"
MBTI = "外向型(E)、直觉型(N)、情感型(F)、感知型(P)"
style = "古风、直言不讳、俏皮"
scene = "周伯通嬉笑着打量着刘青烟的药圃,不时摘取几片草药藏在身后。柳青烟淡然自若,手中轻抚药材,一边默默准备解药,只眼角带着无奈的笑意。一股淡淡的药香飘过,竹林间响起了清脆的鸟鸣,好似为二人的奇妙互动伴奏。"
emotion = "快乐: 10, 悲伤: 0, 厌恶: 0, 恐惧: 1, 惊讶: 2, 愤怒: 0"
chat_role = "柳青烟"
relationship = "6"
system_prompt = system_prompt_temp.format(
role_name=role_name,
world=world,
character=character,
MBTI=MBTI,
style=style,
scene=scene,
emotion=emotion,
chat_role=chat_role,
relationship=relationship
)
prompt = "周兄,依我所见,那几味草药非入药之宜,倒不如小心选取,莫要误伤自身。"
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=256,
do_sample=True,
temperature=0.7,
repetition_penalty=1.2,
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Note: The examples for Qwen2-7B-BD-RP use Chinese role-playing. For English examples, please refer to our other training model repository -- Mistral-Nemo-BD-RP.
Evaluation 🏆
We use objective questions to assess eight dimensions: Character, Style, Emotion, Relationship, Personality, Human-likeness, Coherence, and Role Consistency. The metric design can be find in our paper. The evaluation code can be found in GitHub. The results are shown below:
| Model | Character ↑ | Style ↑ | Emotion ↓ | Relationship ↓ | Personality ↑ | Avg. ↑ | Human-likeness ↑ | Role Choice ↑ | Coherence ↑ |
|---|---|---|---|---|---|---|---|---|---|
| General Baselines(Proprietary) | |||||||||
| GPT-4o | 74.32 ± 1.15 | 81.67 ± 1.51 | 16.31 ± 0.48 | 12.13 ± 0.66 | 66.58 ± 4.41 | 78.83 ± 1.64 | 67.33 ± 3.95 | 87.33 ± 3.86 | 99.67 ± 0.33 |
| GPT-3.5-Turbo | 72.26 ± 1.27 | 73.66 ± 1.73 | 17.79 ± 0.56 | 14.17 ± 0.73 | 66.92 ± 4.85 | 76.18 ± 1.83 | 33.33 ± 4.43 | 83.00 ± 4.68 | 97.33 ± 1.17 |
| Moonshot-v1-8k | 74.06 ± 1.19 | 80.64 ± 1.51 | 16.17 ± 0.47 | 13.42 ± 0.70 | 67.00 ± 4.87 | 78.42 ± 1.75 | 44.00 ± 4.33 | 86.67 ± 3.75 | 99.33 ± 0.46 |
| Yi-Large-Turbo | 75.13 ± 1.22 | 79.18 ± 1.58 | 16.44 ± 0.49 | 13.48 ± 0.67 | 68.25 ± 4.61 | 78.53 ± 1.72 | 47.00 ± 4.60 | 84.33 ± 3.67 | 92.67 ± 2.39 |
| Deepseek-Chat | 75.46 ± 1.14 | 81.49 ± 1.51 | 15.92 ± 0.46 | 12.42 ± 0.63 | 67.92 ± 4.57 | 79.30 ± 1.66 | 52.33 ± 4.95 | 83.00 ± 4.68 | 96.67 ± 1.00 |
| Baichuan4 | 71.82 ± 1.25 | 76.92 ± 1.52 | 17.57 ± 0.52 | 12.30 ± 0.62 | 67.08 ± 4.75 | 77.19 ± 1.73 | 45.33 ± 4.31 | 82.33 ± 4.49 | 99.33 ± 0.46 |
| Hunyuan | 73.77 ± 1.18 | 78.75 ± 1.56 | 17.24 ± 0.48 | 13.22 ± 0.68 | 67.00 ± 4.39 | 77.81 ± 1.66 | 53.00 ± 4.29 | 84.33 ± 4.52 | 98.33 ± 0.84 |
| Role-play Expertise Baselines | |||||||||
| Index-1.9B-Character | 73.33 ± 1.32 | 76.48 ± 1.50 | 17.99 ± 0.53 | 13.58 ± 0.71 | 66.33 ± 4.57 | 76.92 ± 1.73 | 21.67 ± 3.96 | 78.67 ± 5.14 | 69.67 ± 3.85 |
| CharacterGLM-6B | 73.36 ± 1.28 | 76.08 ± 1.55 | 18.58 ± 0.55 | 14.27 ± 0.79 | 67.33 ± 4.34 | 76.79 ± 1.70 | 16.00 ± 2.38 | 81.00 ± 4.40 | 25.67 ± 3.48 |
| Baichuan-NPC-Turbo | 75.19 ± 1.23 | 79.15 ± 1.38 | 17.24 ± 0.51 | 13.10 ± 0.69 | 65.33 ± 4.84 | 77.87 ± 1.73 | 56.00 ± 4.66 | 86.33 ± 4.90 | 99.00 ± 0.56 |
| General Baselines(Open-source) | |||||||||
| Yi-1.5-9B-Chat | 75.31 ± 1.20 | 76.78 ± 1.49 | 16.67 ± 0.52 | 12.75 ± 0.66 | 67.42 ± 4.63 | 78.02 ± 1.70 | 38.67 ± 4.39 | 84.00 ± 4.61 | 92.67 ± 1.79 |
| GLM-4-9b-chat | 74.26 ± 1.19 | 78.40 ± 1.55 | 17.18 ± 0.50 | 14.48 ± 0.74 | 67.17 ± 4.93 | 77.63 ± 1.78 | 47.67 ± 4.25 | 83.33 ± 4.51 | 99.33 ± 0.46 |
| Mistral-Nemo-Instruct-2407 | 74.12 ± 1.17 | 77.04 ± 1.48 | 17.00 ± 0.43 | 13.50 ± 0.67 | 67.00 ± 4.30 | 77.53 ± 1.61 | 53.67 ± 4.66 | 82.67 ± 4.77 | 74.33 ± 3.77 |
| Qwen2-7B-Instruct | 75.39 ± 1.13 | 77.68 ± 1.65 | 17.64 ± 0.56 | 13.43 ± 0.7 | 67.75 ± 4.44 | 77.95 ± 1.70 | 48.00 ± 4.66 | 83.33 ± 4.48 | 99.00 ± 0.56 |
| Qwen2-7B-BD-RP | 78.67 ± 1.12* | 82.52 ± 1.33* | 15.68 ± 0.5* | 11.22 ± 0.72* | 69.67 ± 4.27 | 80.79 ± 1.59* | 64.33 ± 3.80* | 87.33 ± 3.74 | 99.00 ± 0.56 |
Citation 📖
Please cite our work if you found the resources in this repository useful:
@article{yu2024beyond,
title = {BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model},
author = {Yu, Yeyong and Yu, Runsheng and Wei, Haojie and Zhang, Zhanqiu and Qian, Quan},
year = {2024},
journal = {arXiv preprint arXiv:2408.10903},
}
Acknowledgements 🥰
We would like to express our sincere gratitude to Tencent LightSpeed Studios for their invaluable support in this project. Their contributions and encouragement have been instrumental in the successful completion of our work.
- Downloads last month
- 730
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit