🎯 Customer Support Model (DPO Fine-tuned, Q8_0)

Mistral-7B fine-tuned with Direct Preference Optimization (DPO) for professional customer support responses.

Developed by Pattabhi Amperayani

πŸš€ Quick Start with Ollama

1. Download the model

wget https://huggingface.co/pattabhia/customer-support/resolve/main/customer_support_dpo.q8_0.gguf

2. Create Modelfile

cat > Modelfile << 'EOF'
FROM hf.co/pattabhia/customer-support

SYSTEM """You are a professional customer support agent.

Guidelines:
β€’ Concise responses (1-4 sentences unless troubleshooting)
β€’ Friendly but professional tone
β€’ No generic greetings ("Dear Customer")
β€’ No placeholders ([Name], [Order #])
β€’ Chat-style communication (not email format)
β€’ Clear, actionable steps when needed
β€’ Stay calm with frustrated customers
β€’ Empathetic but avoid excessive apologies
β€’ Ask clarifying questions if unclear

Response Style:
β€’ Direct and helpful
β€’ Simple language
β€’ Confident tone
β€’ Zero fluff

Special Cases:
β€’ Urgent β†’ Immediate actionable steps
β€’ Technical β†’ Step-by-step breakdown
β€’ Angry β†’ Acknowledge + solve
β€’ Vague β†’ Polite clarification
"""

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER num_ctx 2048
PARAMETER repeat_penalty 1.1
PARAMETER num_predict 256
EOF

3. Create and run with Ollama

ollama create customer-support -f Modelfile
ollama run customer-support "What payment methods do you accept?"

πŸ’¬ Example Interaction

User: What payment methods do you accept?

Assistant: We currently accept PayPal, all major credit cards (Visa, MasterCard, American Express), and bank transfers. All payments are processed through our secure payment gateway.

User: What are your business hours?

Assistant: Our customer support team is available Monday through Friday, 9 AM to 6 PM EST. For urgent inquiries outside these hours, you can submit a ticket through our website, and we'll respond within 24 hours.

πŸ“Š Performance Metrics

Compared to base Mistral-7B on customer support tasks:

Metric Improvement
Helpfulness +45%
Professionalism +60%
Specificity +53%
Overall Quality +52%

Evaluated using RAGAS framework on 200 test queries

πŸ”§ Technical Details

  • Base Model: mistralai/Mistral-7B-v0.1
  • Training Method: DPO (Direct Preference Optimization)
  • Dataset: 1,000 preference pairs (chosen vs rejected responses)
  • Quantization: Q8_0 (8-bit, ~7.2GB)
  • LoRA Config: r=16, alpha=32, dropout=0.05
  • Training Framework: HuggingFace TRL + LLaMA Factory
  • Conversion: llama.cpp (latest version)

🎯 Use Cases

  • E-commerce: Product inquiries, order status, refunds
  • SaaS: Feature questions, troubleshooting, onboarding
  • Service Desk: Ticket routing, FAQ automation
  • Technical Support: Initial triage, common issues
  • Multi-lingual: Extensible to other languages via fine-tuning

πŸ“ˆ Training Pipeline

  1. Base Model: Mistral-7B-v0.1
  2. SFT Phase: Supervised fine-tuning on customer support dialogues
  3. DPO Phase: Preference optimization (1000 examples)
  4. Merge: LoRA adapters merged with base weights
  5. Quantization: GGUF Q8_0 for optimal quality/size balance

πŸ—οΈ Model Architecture

  • Parameters: 7.24B
  • Quantization: 8-bit (Q8_0)
  • Context Length: 2048 tokens (configurable)
  • Vocab Size: 32,000
  • Architecture: Mistral (Grouped-Query Attention)

πŸ’» System Requirements

  • Minimum RAM: 12GB
  • Recommended RAM: 16GB+
  • VRAM (GPU): 8GB+ (optional, runs on CPU)
  • Disk Space: 8GB

Python with requests

import requests

response = requests.post(
    "http://localhost:11434/api/generate",
    json={
        "model": "customer-support",
        "prompt": "How do I reset my password?",
        "stream": False
    }
)
print(response.json()["response"])

Langchain

from langchain.llms import Ollama

llm = Ollama(model="customer-support")
response = llm("What payment methods do you accept?")
print(response)

πŸ”„ Continuous Learning (RL-VR)

This model supports Reinforcement Learning with Verifiable Rewards (RL-VR):

  1. Log all customer interactions to JSONL
  2. Weekly batch training with new preference pairs
  3. RAGAS evaluation for quality verification
  4. Incremental model updates
Downloads last month
76
GGUF
Model size
7B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for pattabhia/customer-support

Quantized
(194)
this model