HuggingFace Model Card β BART-HighlightSum (Merged Model)
BART-HighlightSum (Merged Model)
Fine-tuned BART-Large on the HighlightSum dialogue summarization dataset (Merged LoRA β Full Model)
Model type: Seq2Seq Summarization
Base model: facebook/bart-large-cnn
Dataset: HighlightSum (dialogue summarization)
Finetuning method: LoRA β merged into full FP16 BART
Model Summary
This model is a merged BART-Large fine-tuned on 2,000 training + 200 validation samples from the HighlightSum dataset. It produces concise, accurate summaries of multi-turn dialogues.
β LoRA fine-tuning
β LoRA weights merged into base BART
β No PEFT required for inference
β Lightweight, fast, and deployment-ready
This version is recommended for production, as it scores highest among all variants (Baseline, LoRA, Merged).
Performance/Evaluation Results
Evaluation on HighlightSum (Validation 200 samples)
The following results were obtained using 200 validation samples from the HighlightSum dataset.
Merged Model Performance
| Metric | Score |
|---|---|
| ROUGE-1 | 0.383 |
| ROUGE-2 | 0.179 |
| ROUGE-L | 0.301 |
| BERTScore (F1) | 0.335 |
| BLEU | 0.0014 |
Comparison with Baseline and LoRA Models
| Metric | Baseline BART | LoRA Model | Merged Model |
|---|---|---|---|
| ROUGE-1 | 0.275 | 0.337 | 0.383 |
| ROUGE-2 | 0.090 | 0.152 | 0.179 |
| ROUGE-L | 0.204 | 0.252 | 0.301 |
| BERTScore (F1) | 0.163 | 0.298 | 0.335 |
| BLEU | 0.0052 | 0.0111 | 0.0014 |
Conclusion
The merged model performs best, achieving the highest ROUGE-1, ROUGE-2, ROUGE-L and BERTScore among all variants. It is therefore the recommended model for deployment, inference, and user-facing applications.
π§ͺ Example Input / Output
Using Example #1 from the HighlightSum dataset:
Dialogue
A: What are you getting him?
B: Something cool.
A: What about a Lego?
B: He is too old for that now.
A: What about a book?
B: He hates reading.
A: Then I give up. I have no idea what to get him.
Human Gold Summary
They discuss gift ideas for someone's son.
Merged Model Summary
They talk about what to get a boy as a gift but can't decide.
β The model captures the intent, context, and key meaning with improved fluency and coherence.
Intended Use
Suitable for
- Dialogue summarization
- Customer service chat compression
- Meeting note extraction
- Educational tools
Not suitable for
- Factual QA
- Domain-specific technical summaries without fine-tuning
- Safety-critical use
How to Use
Python Inference
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
model_name = "dlaima/bart-highlightsum-merged"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
text = """A: Hi Tom, are you busy tomorrow afternoon?
B: I think I am. Why?
A: I want to go to the animal shelter.
B: For what?
A: I'm getting a puppy for my son."""
inputs = tokenizer(text, return_tensors="pt", truncation=True)
summary = model.generate(**inputs, max_new_tokens=192)
print(tokenizer.decode(summary[0], skip_special_tokens=True))
Training Details
- Method: LoRA (rank 8)
- Model: BART-Large
- Batch size: 8 (micro-batch 4 Γ grad-accumulation 2)
- Epochs: ~2.4 (capped by 2000 examples)
- Max input length: 768 tokens
- Max summary length: 192 tokens
- Precision: FP16
- Optimizer: AdamW
- Learning rate: 3e-4
- Hardware: NVIDIA T4
π Dataset: HighlightSum
A dataset of dialogue β summary pairs from multiple conversational sources.
- Multi-turn dialogues
- Short, medium, or long
- Realistic conversational structure
- Human-written summaries
Subset used here:
- 2,000 samples for training
- 200 samples for validation
Files Included in This Repo
| File | Description |
|---|---|
| pytorch_model.bin | Final merged FP16 BART model |
| config.json | Standard HuggingFace config |
| generation_config.json | Beam search config |
| tokenizer.json / tokenizer.model | Tokenizer files |
| README.md | This model card |
Limitations & Recommendations
Limitations
- May shorten overly long dialogues excessively
- Not designed for domain-specific jargon
- Occasionally omits rare names or details
- Not a factual QA model
- Can hallucinate minor details in complex dialogues
Recommendations
- Use merged model for production
- Apply additional fine-tuning for domain-specific tasks
- For 100% reproducibility, fix random seeds and HF transformers version
- Consider quantization (INT8 or GGUF) for mobile deployment
Maintenance
This model will be updated as:
- Additional training data becomes available
- Larger LoRA variants are tested
- Better merging & evaluation pipelines are developed
Contact
For questions, improvements, or collaboration, feel free to reach out via GitHub or HuggingFace (@dlaima).
- Downloads last month
- 75
Model tree for dlaima/bart-highlightsum-merged
Base model
facebook/bart-large-cnn