Instructions to use sukhrobnurali/tooltuned-qwen-3.5-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use sukhrobnurali/tooltuned-qwen-3.5-4b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-4B") model = PeftModel.from_pretrained(base_model, "sukhrobnurali/tooltuned-qwen-3.5-4b") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use sukhrobnurali/tooltuned-qwen-3.5-4b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sukhrobnurali/tooltuned-qwen-3.5-4b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sukhrobnurali/tooltuned-qwen-3.5-4b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for sukhrobnurali/tooltuned-qwen-3.5-4b to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="sukhrobnurali/tooltuned-qwen-3.5-4b", max_seq_length=2048, )
tooltuned-qwen-3.5-4b
TL;DR
LoRA (rank 16) fine-tune of Qwen/Qwen3.5-4B for tool-calling, trained on Salesforce/xlam-function-calling-60k with Unsloth + TRL.
- Result on BFCL V4: 79.0% (base 87.3%, -8.3pp)
- Adapter: sukhrobnurali/tooltuned-qwen-3.5-4b
- Format: bf16 LoRA adapter (Unsloth advises against 4-bit quant for Qwen 3.5)
BFCL V4 results
Gate disclosure (v1.0): This adapter is published below the +3pp BFCL gate defined in the project brief (delta -8.30pp on the in-tree V3 evaluator). The regression is concentrated in
irrelevance/live_irrelevancecategories -- see ADR 0006 for the locked diagnosis and the Phase 3.5 remediation spec (deferred for v1.0).
| Model | Overall accuracy |
|---|---|
| Base (Qwen/Qwen3.5-4B) | 87.3% |
| This adapter | 79.0% |
| Delta | -8.3pp |
Per-category breakdown
| Category | Base | Tuned | Delta |
|---|---|---|---|
| irrelevance | 80.0% | 42.0% | -38.0pp |
| live_irrelevance | 98.0% | 78.0% | -20.0pp |
| live_multiple | 78.0% | 78.0% | +0.0pp |
| live_parallel | 81.2% | 68.8% | -12.5pp |
| live_parallel_multiple | 95.8% | 91.7% | -4.2pp |
| live_relevance | 66.7% | 77.8% | +11.1pp |
| live_simple | 80.0% | 74.0% | -6.0pp |
| multiple | 92.0% | 90.0% | -2.0pp |
| parallel | 88.0% | 88.0% | +0.0pp |
| parallel_multiple | 98.0% | 92.0% | -6.0pp |
| simple | 90.0% | 88.0% | -2.0pp |
n=458, evaluated 2026-05-13
Training data
- Sources:
xlam(Salesforce/xlam-function-calling-60k) - Samples used: 10,000
- Validation fraction: 0.05
- Held-out fraction: 0.05
- Thinking-mode strategy:
preserve(preserves Qwen 3.5's default reasoning trace; xLAM rows have no<think>content so the three strategies converge in practice)
Training procedure
Supervised fine-tuning via Unsloth's FastLanguageModel + TRL's SFTTrainer. Adapter only — base weights are frozen. Single A100 (40 GB), bf16, gradient checkpointing on.
Hyperparameters
| Knob | Value |
|---|---|
base_model |
Qwen/Qwen3.5-4B |
lora.rank |
16 |
lora.alpha |
32 |
lora.dropout |
0.0 |
lora.target_modules |
q_proj, k_proj, v_proj, o_proj |
optimizer |
adamw_8bit |
learning_rate |
0.0002 |
warmup_ratio |
0.03 |
weight_decay |
0.0 |
batch_size |
16 |
grad_accum_steps |
1 |
effective_batch_size |
16 |
epochs |
1 |
max_steps |
n/a |
max_seq_len |
2048 |
packing |
True |
seed |
42 |
Intended use
Function calling / tool use in chat agents. The adapter pairs with the base Qwen 3.5 4B chat template; pass tool schemas in the system prompt and the model emits <tool_call> blocks (or XML-tagged <function=...> calls; the inference helper parses both).
Out of scope
- Non-English instruction following (xLAM is English-only).
- Long-context tool dialogues beyond 2,048 tokens — the adapter was trained at that sequence length.
- Safety-critical decisions. The adapter inherits Qwen 3.5's safety profile, no additional alignment was applied.
Limitations
- LoRA rank 16 is a known-safe default, not an ablated optimum. Higher ranks may move the BFCL number further; rank ablations are a stretch goal.
- BFCL holds out one slice of tool-calling behavior; performance on task families outside that distribution (multi-turn agentic loops, fully novel APIs) is not directly measured.
License
Apache-2.0, matching the base model Qwen/Qwen3.5-4B.
Reproduction
Source: https://github.com/sukhrobnurali/tooltuned-qwen. Pinned versions live in pyproject.toml; the lockfile (uv.lock) is the reproducibility contract.
git clone https://github.com/sukhrobnurali/tooltuned-qwen
cd tooltuned-qwen
uv sync
# Run on Colab Pro A100; see notebooks/colab_main.ipynb
Training curves: https://wandb.ai/sukhrob-production/tooltuned-qwen.
Citation
@misc{nurali_tooltuned_qwen_2026,
author = {Sukhrob Nurali},
title = {tooltuned-qwen-3.5-4b: a tool-calling LoRA for Qwen 3.5 4B},
year = {2026},
howpublished = {\url{https://huggingface.co/sukhrobnurali/tooltuned-qwen-3.5-4b}}
}
Author
- Sukhrob Nurali —
sukhrobnurali@gmail.com - Hugging Face: sukhrobnurali
- GitHub: sukhrobnurali
- Downloads last month
- 69