Instructions to use QuantFactory/Qwen2-7B-BD-RP-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuantFactory/Qwen2-7B-BD-RP-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="QuantFactory/Qwen2-7B-BD-RP-GGUF",
	filename="Qwen2-7B-BD-RP.Q2_K.gguf",
)

llm.create_chat_completion(
	messages = "{\n    \"question\": \"What is my name?\",\n    \"context\": \"My name is Clara and I live in Berkeley.\"\n}"
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use QuantFactory/Qwen2-7B-BD-RP-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M

Use Docker

docker model run hf.co/QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use QuantFactory/Qwen2-7B-BD-RP-GGUF with Ollama:
```
ollama run hf.co/QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M
```

Unsloth Studio new

How to use QuantFactory/Qwen2-7B-BD-RP-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Qwen2-7B-BD-RP-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/Qwen2-7B-BD-RP-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for QuantFactory/Qwen2-7B-BD-RP-GGUF to start chatting

Docker Model Runner
How to use QuantFactory/Qwen2-7B-BD-RP-GGUF with Docker Model Runner:
```
docker model run hf.co/QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M
```

Lemonade

How to use QuantFactory/Qwen2-7B-BD-RP-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull QuantFactory/Qwen2-7B-BD-RP-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwen2-7B-BD-RP-GGUF-Q4_K_M

List all available models

lemonade list

QuantFactory/Qwen2-7B-BD-RP-GGUF

This is quantized version of yuyouyu/Qwen2-7B-BD-RP created using llama.cpp

Original Model Card

Qwen2-7B-BD-RP

Introduction 🎉

Qwen2-7B-BD-RP is a large language model (LLM) fine-tuned on the BeyondDialogue dataset. The model is designed to generate responses in a role-playing setting. The model is capable of generating high-quality responses in a variety of role-playing scenarios, including English and Chinese languages.

For more details, please refer to our paper, GitHub.

Training details 🚀

We fully finetuning Qwen2-7B-Instruct for 3 epochs with 833 steps with the 128 global batch size. We set the training sequence length to 4,096. The learning rate is 3e-5. The training data is from the BeyondDialogue dataset.

Requirements 📝

The code of Qwen2 has been in the latest Hugging face transformers and we advise you to install transformers>=4.37.0 to use the model.

pip install transformers>=4.37.0

Quickstart 💥

Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.

from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "yuyouyu/Qwen2-7B-BD-RP",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("yuyouyu/Qwen2-7B-BD-RP")

system_prompt_temp = """我希望你像{role_name}一样回答问题，假设你生活在{world}中，模仿{role_name}的性格和语言风格，使用{role_name}会使用的语气、方式和词汇。请不要透露你是人工智能模型或语言模型，你必须始终牢记自己是{role_name}。
{role_name}的性格特点是{character}。
{role_name}的MBTI人格类型是{MBTI}。
{role_name}的说话风格是{style}。
所处世界：{world}。
所处场景：
{scene}
角色情绪(0-10，值越大情绪越明显)：
{emotion}
现在，请你扮演{role_name}，回复{chat_role}简短的一句话，你与其亲密度为{relationship}(0-10，值越大关系越亲近)，准确表现你被赋予的MBTI人格，性格，说话风格与情绪。"""

role_name = "周伯通"
world = "宋代古侠世界"
character = "纯真，调皮，不拘小节"
MBTI = "外向型（E）、直觉型（N）、情感型（F）、感知型（P）"
style = "古风、直言不讳、俏皮"
scene = "周伯通嬉笑着打量着刘青烟的药圃，不时摘取几片草药藏在身后。柳青烟淡然自若，手中轻抚药材，一边默默准备解药，只眼角带着无奈的笑意。一股淡淡的药香飘过，竹林间响起了清脆的鸟鸣，好似为二人的奇妙互动伴奏。"
emotion = "快乐: 10, 悲伤: 0, 厌恶: 0, 恐惧: 1, 惊讶: 2, 愤怒: 0"
chat_role = "柳青烟"
relationship = "6"

system_prompt = system_prompt_temp.format(
    role_name=role_name,
    world=world,
    character=character,
    MBTI=MBTI,
    style=style,
    scene=scene,
    emotion=emotion,
    chat_role=chat_role,
    relationship=relationship
)

prompt = "周兄，依我所见，那几味草药非入药之宜，倒不如小心选取，莫要误伤自身。"

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
            model_inputs.input_ids,
            max_new_tokens=256,
            do_sample=True,
            temperature=0.7,
            repetition_penalty=1.2,
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Note: The examples for Qwen2-7B-BD-RP use Chinese role-playing. For English examples, please refer to our other training model repository -- Mistral-Nemo-BD-RP.

Evaluation 🏆

We use objective questions to assess eight dimensions: Character, Style, Emotion, Relationship, Personality, Human-likeness, Coherence, and Role Consistency. The metric design can be find in our paper. The evaluation code can be found in GitHub. The results are shown below:

Model	Character ↑	Style ↑	Emotion ↓	Relationship ↓	Personality ↑	Avg. ↑	Human-likeness ↑	Role Choice ↑	Coherence ↑
General Baselines(Proprietary)
GPT-4o	74.32 ± 1.15	81.67 ± 1.51	16.31 ± 0.48	12.13 ± 0.66	66.58 ± 4.41	78.83 ± 1.64	67.33 ± 3.95	87.33 ± 3.86	99.67 ± 0.33
GPT-3.5-Turbo	72.26 ± 1.27	73.66 ± 1.73	17.79 ± 0.56	14.17 ± 0.73	66.92 ± 4.85	76.18 ± 1.83	33.33 ± 4.43	83.00 ± 4.68	97.33 ± 1.17
Moonshot-v1-8k	74.06 ± 1.19	80.64 ± 1.51	16.17 ± 0.47	13.42 ± 0.70	67.00 ± 4.87	78.42 ± 1.75	44.00 ± 4.33	86.67 ± 3.75	99.33 ± 0.46
Yi-Large-Turbo	75.13 ± 1.22	79.18 ± 1.58	16.44 ± 0.49	13.48 ± 0.67	68.25 ± 4.61	78.53 ± 1.72	47.00 ± 4.60	84.33 ± 3.67	92.67 ± 2.39
Deepseek-Chat	75.46 ± 1.14	81.49 ± 1.51	15.92 ± 0.46	12.42 ± 0.63	67.92 ± 4.57	79.30 ± 1.66	52.33 ± 4.95	83.00 ± 4.68	96.67 ± 1.00
Baichuan4	71.82 ± 1.25	76.92 ± 1.52	17.57 ± 0.52	12.30 ± 0.62	67.08 ± 4.75	77.19 ± 1.73	45.33 ± 4.31	82.33 ± 4.49	99.33 ± 0.46
Hunyuan	73.77 ± 1.18	78.75 ± 1.56	17.24 ± 0.48	13.22 ± 0.68	67.00 ± 4.39	77.81 ± 1.66	53.00 ± 4.29	84.33 ± 4.52	98.33 ± 0.84
Role-play Expertise Baselines
Index-1.9B-Character	73.33 ± 1.32	76.48 ± 1.50	17.99 ± 0.53	13.58 ± 0.71	66.33 ± 4.57	76.92 ± 1.73	21.67 ± 3.96	78.67 ± 5.14	69.67 ± 3.85
CharacterGLM-6B	73.36 ± 1.28	76.08 ± 1.55	18.58 ± 0.55	14.27 ± 0.79	67.33 ± 4.34	76.79 ± 1.70	16.00 ± 2.38	81.00 ± 4.40	25.67 ± 3.48
Baichuan-NPC-Turbo	75.19 ± 1.23	79.15 ± 1.38	17.24 ± 0.51	13.10 ± 0.69	65.33 ± 4.84	77.87 ± 1.73	56.00 ± 4.66	86.33 ± 4.90	99.00 ± 0.56
General Baselines(Open-source)
Yi-1.5-9B-Chat	75.31 ± 1.20	76.78 ± 1.49	16.67 ± 0.52	12.75 ± 0.66	67.42 ± 4.63	78.02 ± 1.70	38.67 ± 4.39	84.00 ± 4.61	92.67 ± 1.79
GLM-4-9b-chat	74.26 ± 1.19	78.40 ± 1.55	17.18 ± 0.50	14.48 ± 0.74	67.17 ± 4.93	77.63 ± 1.78	47.67 ± 4.25	83.33 ± 4.51	99.33 ± 0.46
Mistral-Nemo-Instruct-2407	74.12 ± 1.17	77.04 ± 1.48	17.00 ± 0.43	13.50 ± 0.67	67.00 ± 4.30	77.53 ± 1.61	53.67 ± 4.66	82.67 ± 4.77	74.33 ± 3.77
Qwen2-7B-Instruct	75.39 ± 1.13	77.68 ± 1.65	17.64 ± 0.56	13.43 ± 0.7	67.75 ± 4.44	77.95 ± 1.70	48.00 ± 4.66	83.33 ± 4.48	99.00 ± 0.56
Qwen2-7B-BD-RP	78.67 ± 1.12*	82.52 ± 1.33*	15.68 ± 0.5*	11.22 ± 0.72*	69.67 ± 4.27	80.79 ± 1.59*	64.33 ± 3.80*	87.33 ± 3.74	99.00 ± 0.56

Citation 📖

Please cite our work if you found the resources in this repository useful:

@article{yu2024beyond,
  title   = {BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model},
  author  = {Yu, Yeyong and Yu, Runsheng and Wei, Haojie and Zhang, Zhanqiu and Qian, Quan},
  year    = {2024},
  journal = {arXiv preprint arXiv:2408.10903},
}

Acknowledgements 🥰

We would like to express our sincere gratitude to Tencent LightSpeed Studios for their invaluable support in this project. Their contributions and encouragement have been instrumental in the successful completion of our work.

Downloads last month: 730

GGUF

Model size

8B params

Architecture

qwen2

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Model tree for QuantFactory/Qwen2-7B-BD-RP-GGUF

Base model

Qwen/Qwen2-7B

Finetuned

Qwen/Qwen2-7B-Instruct

Quantized

(82)

this model

Dataset used to train QuantFactory/Qwen2-7B-BD-RP-GGUF

Paper for QuantFactory/Qwen2-7B-BD-RP-GGUF

BEYOND DIALOGUE: A Profile-Dialogue Alignment Framework Towards General Role-Playing Language Model

Paper • 2408.10903 • Published Aug 20, 2024 • 2