π― Customer Support Model (DPO Fine-tuned, Q8_0)
Mistral-7B fine-tuned with Direct Preference Optimization (DPO) for professional customer support responses.
Developed by Pattabhi Amperayani
π Quick Start with Ollama
1. Download the model
wget https://huggingface.co/pattabhia/customer-support/resolve/main/customer_support_dpo.q8_0.gguf
2. Create Modelfile
cat > Modelfile << 'EOF'
FROM hf.co/pattabhia/customer-support
SYSTEM """You are a professional customer support agent.
Guidelines:
β’ Concise responses (1-4 sentences unless troubleshooting)
β’ Friendly but professional tone
β’ No generic greetings ("Dear Customer")
β’ No placeholders ([Name], [Order #])
β’ Chat-style communication (not email format)
β’ Clear, actionable steps when needed
β’ Stay calm with frustrated customers
β’ Empathetic but avoid excessive apologies
β’ Ask clarifying questions if unclear
Response Style:
β’ Direct and helpful
β’ Simple language
β’ Confident tone
β’ Zero fluff
Special Cases:
β’ Urgent β Immediate actionable steps
β’ Technical β Step-by-step breakdown
β’ Angry β Acknowledge + solve
β’ Vague β Polite clarification
"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER num_ctx 2048
PARAMETER repeat_penalty 1.1
PARAMETER num_predict 256
EOF
3. Create and run with Ollama
ollama create customer-support -f Modelfile
ollama run customer-support "What payment methods do you accept?"
π¬ Example Interaction
User: What payment methods do you accept?
Assistant: We currently accept PayPal, all major credit cards (Visa, MasterCard, American Express), and bank transfers. All payments are processed through our secure payment gateway.
User: What are your business hours?
Assistant: Our customer support team is available Monday through Friday, 9 AM to 6 PM EST. For urgent inquiries outside these hours, you can submit a ticket through our website, and we'll respond within 24 hours.
π Performance Metrics
Compared to base Mistral-7B on customer support tasks:
| Metric | Improvement |
|---|---|
| Helpfulness | +45% |
| Professionalism | +60% |
| Specificity | +53% |
| Overall Quality | +52% |
Evaluated using RAGAS framework on 200 test queries
π§ Technical Details
- Base Model: mistralai/Mistral-7B-v0.1
- Training Method: DPO (Direct Preference Optimization)
- Dataset: 1,000 preference pairs (chosen vs rejected responses)
- Quantization: Q8_0 (8-bit, ~7.2GB)
- LoRA Config: r=16, alpha=32, dropout=0.05
- Training Framework: HuggingFace TRL + LLaMA Factory
- Conversion: llama.cpp (latest version)
π― Use Cases
- E-commerce: Product inquiries, order status, refunds
- SaaS: Feature questions, troubleshooting, onboarding
- Service Desk: Ticket routing, FAQ automation
- Technical Support: Initial triage, common issues
- Multi-lingual: Extensible to other languages via fine-tuning
π Training Pipeline
- Base Model: Mistral-7B-v0.1
- SFT Phase: Supervised fine-tuning on customer support dialogues
- DPO Phase: Preference optimization (1000 examples)
- Merge: LoRA adapters merged with base weights
- Quantization: GGUF Q8_0 for optimal quality/size balance
ποΈ Model Architecture
- Parameters: 7.24B
- Quantization: 8-bit (Q8_0)
- Context Length: 2048 tokens (configurable)
- Vocab Size: 32,000
- Architecture: Mistral (Grouped-Query Attention)
π» System Requirements
- Minimum RAM: 12GB
- Recommended RAM: 16GB+
- VRAM (GPU): 8GB+ (optional, runs on CPU)
- Disk Space: 8GB
Python with requests
import requests
response = requests.post(
"http://localhost:11434/api/generate",
json={
"model": "customer-support",
"prompt": "How do I reset my password?",
"stream": False
}
)
print(response.json()["response"])
Langchain
from langchain.llms import Ollama
llm = Ollama(model="customer-support")
response = llm("What payment methods do you accept?")
print(response)
π Continuous Learning (RL-VR)
This model supports Reinforcement Learning with Verifiable Rewards (RL-VR):
- Log all customer interactions to JSONL
- Weekly batch training with new preference pairs
- RAGAS evaluation for quality verification
- Incremental model updates
- Downloads last month
- 76
8-bit
Model tree for pattabhia/customer-support
Base model
mistralai/Mistral-7B-v0.1