Corch V13 Balanced: Task Routing Foundation Model

87.30% Average Accuracy | Perfect Domain & Capability Classification

A multi-task foundation model for intelligent software engineering task routing, achieving breakthrough performance through balanced synthetic data generation.

Model Description

Corch V13 Balanced is a 805K parameter neural network that classifies software engineering tasks across 4 dimensions:

  1. Domain (19 classes): frontend, backend, machine_learning, etc. - 100% accuracy 🎯
  2. Capability (8 classes): code_generation, debugging, testing, etc. - 100% accuracy 🎯
  3. Strategy (2 classes): DIRECT vs ORCHESTRATE - 85.98% accuracy
  4. Execution Type (5 classes): single_task, multi_step, etc. - 63.20% accuracy

Performance

Task Accuracy Improvement from V10
Average 87.30% +20.46%
Domain 100.00% 🎯 +14.59%
Capability 100.00% 🎯 +39.61%
Strategy 85.98% +12.55%
Execution 63.20% +7.94%

Key Innovation: Balanced Synthetic Data

The breakthrough came from solving severe class imbalance (324:1 ratio):

  • Generated 49,307 synthetic examples using GPT-5-Pro
  • Balanced dataset to ~10K examples per domain
  • Eliminated rare class zero-accuracy problem

Before balancing:

  • machine_learning domain: 88 examples β†’ 0% accuracy
  • other domain: 57 examples β†’ 0% accuracy

After balancing:

  • All domains: ~10K examples β†’ 100% accuracy βœ…

Architecture

Input Text β†’ BGE-large-en-v1.5 Embedding (1024d)
            ↓
Shared Layers:
  - Linear(1024 β†’ 512) + ReLU + Dropout(0.3)
  - Linear(512 β†’ 512) + ReLU + Dropout(0.3)
            ↓
Task-Specific Heads:
  β”œβ”€ Strategy Head β†’ Linear(512 β†’ 2)
  β”œβ”€ Capability Head β†’ Linear(512 β†’ 8)
  β”œβ”€ Domain Head β†’ Linear(512 β†’ 19)
  └─ Execution Head β†’ Linear(512 β†’ 5)

Parameters: 804,898
Training Time: ~1 minute (30 epochs, early stopped)
Hardware: AMD MI300X GPU

Usage

import torch
from transformers import AutoTokenizer, AutoModel

# Load BGE embedding model
tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-large-en-v1.5")
embedding_model = AutoModel.from_pretrained("BAAI/bge-large-en-v1.5")

# Load Corch V13 Balanced model
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(repo_id="bledden/corch-v13-balanced", filename="model_v13_balanced.pt")

# Initialize model
class FoundationModelV13(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.shared = torch.nn.Sequential(
            torch.nn.Linear(1024, 512),
            torch.nn.ReLU(),
            torch.nn.Dropout(0.3),
            torch.nn.Linear(512, 512),
            torch.nn.ReLU(),
            torch.nn.Dropout(0.3)
        )
        self.strategy_head = torch.nn.Linear(512, 2)
        self.capability_head = torch.nn.Linear(512, 8)
        self.domain_head = torch.nn.Linear(512, 19)
        self.execution_head = torch.nn.Linear(512, 5)
    
    def forward(self, x):
        shared = self.shared(x)
        return {
            'strategy': self.strategy_head(shared),
            'capability': self.capability_head(shared),
            'domain': self.domain_head(shared),
            'execution': self.execution_head(shared)
        }

model = FoundationModelV13()
checkpoint = torch.load(model_path, weights_only=True)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Embed and predict
def route_task(task_text):
    # Generate embedding
    inputs = tokenizer(task_text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        embedding = embedding_model(**inputs).last_hidden_state[:, 0, :]
    
    # Get predictions
    with torch.no_grad():
        outputs = model(embedding)
    
    strategy = ["DIRECT", "ORCHESTRATE"][outputs['strategy'].argmax().item()]
    capability = ["code_generation", "debugging", "documentation", "optimization",
                  "refactoring", "testing", "design", "data_analysis"][outputs['capability'].argmax().item()]
    domain = ["frontend", "backend", "data_processing", "machine_learning", "devops",
              "testing", "security", "mobile", "data_engineering", "cloud", "database",
              "api", "ui_ux", "general", "iot", "blockchain", "game_dev", "embedded", 
              "other"][outputs['domain'].argmax().item()]
    execution = ["single_task", "multi_step", "iterative", "parallel", 
                 "sequential"][outputs['execution'].argmax().item()]
    
    return {
        "strategy": strategy,
        "capability": capability,
        "domain": domain,
        "execution_type": execution
    }

# Example
result = route_task("Build a CNN image classifier using PyTorch for medical imaging")
print(result)
# {
#   'strategy': 'ORCHESTRATE',
#   'capability': 'code_generation',
#   'domain': 'machine_learning',  # 100% confidence
#   'execution_type': 'multi_step'
# }

Training Data

  • Training set: 31,592 examples (balanced)
  • Validation set: 3,495 examples
  • Synthetic examples: 49,307 (generated via GPT-5-Pro)
  • Real examples: ~550K (existing dataset)
  • Final dataset: Balanced to ~10K per domain

Synthetic Data Generation

Used GPT-5-Pro with domain-specific prompts:

Generate a realistic software engineering task for: {domain}
Required: {capability}, {execution_type}, {strategy}
Output: 1-3 sentence task description with realistic terminology

Cost: ~$500 for 49,307 examples
Quality: 100% unique, zero duplicates, validated schemas

Label Mappings

Strategy (2): DIRECT, ORCHESTRATE
Capability (8): code_generation, debugging, documentation, optimization, refactoring, testing, design, data_analysis
Domain (19): frontend, backend, data_processing, machine_learning, devops, testing, security, mobile, data_engineering, cloud, database, api, ui_ux, general, iot, blockchain, game_dev, embedded, other
Execution (5): single_task, multi_step, iterative, parallel, sequential

Comparison to Baselines

Model Architecture Data Avg Acc Domain Acc
Logistic Regression Single-task Imbalanced 74.61% 74.61%
V10 Multi-task Imbalanced 66.84% 85.41%
V13 Balanced Multi-task Balanced 87.30% 100.00%

Limitations

  • Execution type prediction (63.20%) still has room for improvement
  • Context-independent (doesn't use conversation history yet)
  • English-only
  • Focused on software engineering tasks

Citation

@software{corch_v13_balanced_2024,
  title = {Corch V13 Balanced: Task Routing Foundation Model},
  author = {Bledden, Team},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/bledden/corch-v13-balanced},
  note = {87.30% accuracy via balanced synthetic data generation}
}

License

MIT License

Links


Built with ❀️ by the Corch Team | Powered by balanced synthetic data generation

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results